[PDF] The Bootstrap for Network Dependent Processes

Abstract

This paper focuses on the bootstrap for network dependent processes under the conditional \psi-weak dependence. Such processes are distinct from other forms of random fields studied in the statistics and econometrics literature so that the existing bootstrap methods cannot be applied directly. We propose a block-based approach and a modification of the dependent wild bootstrap for constructing confidence sets for the mean of a network dependent process. In addition, we establish the consistency of these methods for the smooth function model and provide the bootstrap alternatives to the network heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We find that the modified dependent wild bootstrap and the corresponding variance estimator are consistent under weaker conditions relative to the block-based method, which makes the former approach preferable for practical implementation.

Full PDF

TThe Bootstrap for Network Dependent Processes

Denis Kojevnikov

Tilburg University

Abstract.

This paper focuses on the bootstrap for network dependent processes un-der the conditional ψ -weak dependence. Such processes are distinct from other forms ofrandom ﬁelds studied in the statistics and econometrics literature so that the existingbootstrap methods cannot be applied directly. We propose a block-based approach anda modiﬁcation of the dependent wild bootstrap for constructing conﬁdence sets for themean of a network dependent process. In addition, we establish the consistency of thesemethods for the smooth function model and provide the bootstrap alternatives to thenetwork heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We ﬁndthat the modiﬁed dependent wild bootstrap and the corresponding variance estimator areconsistent under weaker conditions relative to the block-based method, which makes theformer approach preferable for practical implementation. Keywords.

Conditional bootstrap; Block bootstrap; Dependent wild bootstrap; Networkdependent process; Random ﬁeld; Conditional ψ -weak dependence.

1. Introduction

The aim of this paper is developing bootstrap approaches for the sample mean of networkdependent processes studied in Kojevnikov, Marmer, and Song (2020, hereafter KMS). Anetwork dependent process is a random ﬁeld indexed by the set of nodes of a given undi-rected network. This network governs the stochastic dependence between the elementsof the associated random ﬁeld. Speciﬁcally, the latter is assumed to satisfy a conditionalversion of the ψ -weak dependence condition of Doukhan and Louhichi (1999) given a com-mon shock of a general form. KMS (2020) show that the pointwise Law of Large Numbersand the Central Limit Theorem hold for a sequence of such processes under suitable as-sumptions on the networks’ denseness and the strength of the stochastic dependence. In E-mail address : [email protected] . Date : February 1, 2021. a r X i v : . [ ec on . E M ] J a n addition, they provide nonparametric HAC estimators of the variance-covariance matrixfor the vector of sample moments, which is similar to the spatial HAC estimator developedin Kelejian and Prucha (2007).These results provide an asymptotic approximation of the distribution of the samplemean which can be used for inference on the true mean of a network dependent process.However, this approximation relies on the network HAC estimator which has two majordrawbacks. First, unlike its spatial or time-series counterparts it is not guaranteed to yielda positive semi-deﬁnite estimate. Second, these estimators are known to have poor ﬁnitesample properties (see, e.g., Matyas, 1999, Section 3.5). The aim of the current work is toprovide an alternative nonparametric way to conduct inference in these settings.The nonparametric bootstrap methods for the case of weakly dependent observationshave been studies since the introduction of the non-overlapping block bootstrap in Carl-stein (1986) and the moving block bootstrap in K¨unsch (1989) and Liu and Singh (1992)for stationary, mixing time-series. Since then, a number of block-based methods have beenconsidered in the statistics literature. They share the idea of resampling groups of consec-utive observations to capture the stochastic dependence in the original series and include,among others, the circular block bootstrap (Politis and Romano, 1992), the stationarybootstrap (Politis and Romano, 1994) and the tapered block bootstrap (Paparoditis andPolitis, 2001). A detailed exposition and comparison of some of these methods can befound in Lahiri (2003). Block-based bootstrap was also successfully applied to the caseof weakly dependent random ﬁelds satisfying certain mixing conditions (see, e.g., Lahiri,2003, Section 12 and references therein).More recent developments in this area of research are discussed in Gon¸calves and Politis(2011). In particular, the dependent wild bootstrap proposed in B¨uhlmann (1993) andShao (2010) departs from other methods. Instead of using blocks, it tries to mimic theautocovariance structure of the original data by introducing auxiliary random variablesand can be applied to irregularly spaced data. A related method, the dependent randomweighting, was recently introduced in Sengupta, Shao, and Wang (2015) and has widerapplicability; speciﬁcally, it can be directly applied to irregularly spaced spatial data.Another useful resampling technique developed for stationary and nonstationary times-series and homogenous random ﬁelds under mixing is subsampling. A comprehensive treat-ment of this method is given in Politis, Romano, and Wolf (1999). Interestingly, in thetime-series case subsampling is similar to the moving block bootstrap where a single blockis resampled. Finally, it is worth mentioning the spatial smoothed bootstrap suggestedin Garcia-Soidan, Menezes, and Rubinos (2014). In this instance, assuming homogeneityof the underlying data generating process, bootstrap pseudo-samples are drawn from theestimated joint distribution of a given sample. Network dependent processes are closely related to random ﬁelds indexed by elementsof a lattice in R d (see, e.g., Comets and Janˇzura, 1998; Conley, 1999). However, they arenot a special case of the latter and so the existing bootstrap methods cannot be directlyapplied to our framework. The main reason for that is the irregularity of the structureof underlying networks. In particular, subsampling and all types of the block bootstrapfor time-series and spatial data rely on the existence of ordered blocks of closely-locatedobservations. The dependent wild bootstrap uses a well-known property of kernel functionsthat guarantees the positive semi-deﬁniteness of certain weighting matrices. However, asargued in KMS (2020), this relation does not necessarily hold when applied to networks.Finally, the homogeneity assumption of the spatial smoothed bootstrap and the spatialsubsampling, that is the invariance of joint distributions under spatial shifts is not suitablefor our case.We propose two bootstrap approaches for constructing asymptotically valid conﬁdencesets for the mean of a network dependent process and establish the ﬁrst-order consistencyof these methods for smooth functions of means conditionally on the common shock. Theﬁrst approach is a block-based method in which blocks are constructed from certain neigh-borhoods of each node in a network. The second is a modiﬁcation of the dependent wildbootstrap that employs the topology of a given graph to generate random weights insteadof using a ﬁxed kernel function. In addition, we provide the bootstrap variance estima-tors of the scaled sample mean which yield positive semi-deﬁnite estimates and can beused as an alternative to the network HAC estimator. We ﬁnd that the consistency of themodiﬁed dependent wild bootstrap and the corresponding variance estimator holds underweaker conditions as compared with the block bootstrap. However, the bootstrap distri-bution corresponding to the former method may fail to match the higher-order cumulantsof the underlying data generating process, thus preventing improvements over asymptoticapproximations.The rest of the paper is organized as follows. The next section describes a modiﬁcationof network dependent processes allowing for weighted networks. This modiﬁcation canbe useful for handling dense graphs once varying intensity of links is assumed. Section3 provides some general result regarding the conditional bootstrap. Speciﬁcally, we usethe almost sure convergence of probability kernels to ensure that the bootstrap is valid for(almost) every realization of the common shock, which may also represent the stochasticnetwork formation process. In Section 4 we present the above-mentioned bootstrap methodsin detail and establish suﬃcient conditions for their conditional consistency. All the proofsand other technical details are presented in the Appendices A-C.

2. The Setup

We consider a variation of network dependent processes characterized in KMS (2020).Namely, let G ≡ ( N, E ) be an undirected graph (possibly inﬁnite), where N is the set ofnodes and E denotes the set of links (we identify N with integers { , , . . . } ). Each edge e ∈ E is associated with a weight W ( e ) ∈ ¯ R . Also let the function d : N × N → ¯ R ≥ bea distance on G ; for example, the shortest path distance for an unweighted graph. An X -valued network dependent process Y ≡ ( Y, G ) is a collection of X -valued random elementsdeﬁned on a common probability space indexed by N , i.e., { Y i : i ∈ N } . The network G governs the stochastic dependence between random elements. In this paper we consider X = R v with v ≥ { ( Y n , G n ) } deﬁned on a common probability space (Ω , H , P ), where each G n ≡ ( N n , E n ) is a ﬁnitegraph of size m n → ∞ as n → ∞ ; w.l.o.g. we set m n = n . Here, the sequence { G n } canbe a sequence of subgraphs of an inﬁnite network ( N ∞ , E ∞ ). In general, however, thesegraphs can be unrelated. In order to emphasize the dependence of the distance betweentwo nodes on n , we denote it as d n ( · , · ). Additionally, since the distance function mayimplicitly depend on weights associated with the edges of a graph, we impose the followingrestriction in order to employ the results established in KMS (2020) with the least possiblechange. Assumption 2.1.

For all n ≥

1, min i,j ∈ N n d n ( i, j ) ≥ d n ( i, j ) = ∞ whenever i, j ∈ N n are disconnected (i.e., there is no path connecting i and j ).For example, if W ( e ) ∈ [0 ,

1] for all e ∈ E , which can be interpreted as the intensity oflinks, then the shortest weighed distance associated with 1 /W ( · ) satisﬁes this assumption,where implicitly we set 1 / ≡ ∞ . In this case an unweighted network ( N, E ) is equivalentto a complete graph (

N, E (cid:48) ), where for e ∈ E (cid:48) , W ( e ) = { e ∈ E } . In a similar manner,the (at most countable) parameter space of a random ﬁeld on a metric space ( Z , ρ ) can bemodelled as a compete graph of suitable cardinality, where W ( x ↔ y ) is a function of thedistance ρ between two points x, y ∈ Z . Then Assumption 2.1 corresponds to the case ofincreasing domain asymptotics (see, e.g., Conley, 1999; Jenish and Prucha, 2009).Let C ⊂ H be a given sub- σ -ﬁeld. We assume that the sequence of network dependentprocesses is conditionally weakly dependent given C . Speciﬁcally, for a, b ∈ N and s ≥ P n ( a, b ; s ) := { ( A, B ) ⊂ N n : | A | = a, | B | = b, d n ( A, B ) ≥ s } with d n ( A, B ) := min i ∈ A,j ∈ B d n ( i, j ) and let L v be the family of real-valued, bounded,Lipschitz functions, i.e., L v := (cid:91) a ≥ L v,a , where L v,a := { f : R v × a → R : (cid:107) f (cid:107) ∞ < ∞ , Lip( f ) < ∞} . The functions in L v,a are Lipschitz with respect to the distance δ a on R v × a given by δ a ( x , y ) := a (cid:88) l =1 (cid:107) x l − y l (cid:107) , where (cid:107) · (cid:107) is a norm on R v and x ≡ ( x , . . . , x a ) and y ≡ ( y , . . . , y a ) are points in R v × a .In addition, for a set of nodes A ⊂ N n we write Y n,A ≡ { Y n,i : i ∈ A } . Deﬁnition 2.1.

A sequence { ( Y n , G n ) } is ( L v , ψ, C )-weakly dependent if for each n ≥ C -measurable sequence γ n ≡ { γ n,s } ∞ s =1 and a collection of nonrandom functions( ψ a,b ) a,b ∈ N , ψ a,b : L a × L b → R ≥ such that for any ( A, B ) ∈ P n ( a, b ; s ) with s ≥ f ∈ L v,a and g ∈ L v,b ,(2.1) | Cov( f ( Y n,A ) , g ( Y n,B ) | C ) | ≤ ψ a,b ( f, g ) γ n, (cid:98) s (cid:99) a.s. Remark. (a) When it is clear from the context, we denote such a sequence as { Y n } omit-ting the reference to the underlying networks. (b) ( L v , ψ ) ≡ ( L v , ψ, {∅ , Ω } ). (c) Theelements of { γ n,s } are called the weak-dependence coeﬃcients associated with { Y n } . (d)For convenience, we set γ n, ≡ L v , ψ, C )-weakly de-pendent are given in KMS (2020). For instance, strong mixing processes correspond to ψ a,b ( f, g ) = 4 (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ . Also associated and Gaussian processes and their certain deriva-tives are ( L v , ψ, C )-weakly dependent with ψ a,b ( f, g ) = ab Lip( f ) Lip( g ). It is worth men-tioning that the corresponding weak dependence coeﬃcients may depend on the topologyof the underlying networks.Conditioning on a σ -ﬁeld C can be useful in various cases. First, if the underlying graphsare realizations of a stochastic network formation process, then one can potentially condi-tion on the σ -ﬁeld generated by that process and treat the observed graphs as ﬁxed. Second,ﬁxing nodes with high degree centrality may help to obtain local stochastic dependence. Example 2.1.

Consider a set independent random variables { ε i : i ∈ N } and let C ⊂ N denote a set of nodes with “high” degree centrality (for clarity, we omitted the subscript n ).Then u N \ C , where u i := ε i + (cid:80) j ∈ C β ij ε j and β ij ∈ R , are conditionally independent given C = σ ( ε C ). Moreover, for arbitrary measurable functions { φ i } the process { Y i := φ i ( u N ) } satisﬁes the covariance bound (2.1) with ψ a,b ( f, g ) = a (cid:107) g (cid:107) ∞ Lip( f ) + b (cid:107) f (cid:107) ∞ Lip( g ). In the context of social interaction models { u i } and { Y i } may represent idiosyncratic shocks andobservable outcomes, respectively.In order to facilitate the exposition, throughout the paper we consider a sequence ofnetwork dependent processes { Y n } satisfying the covariance bound (2.1) with a speciﬁcform of the function ψ a,b and bounded weak dependence coeﬃcients. The restricted ψ a,b function is fairly general and covers many useful examples of weakly dependent processes. Assumption 2.2. { ( Y n , G n ) } is ( L v , ψ, C )-weakly dependent and there exist constants M ≥ C > γ n,s ≤ M a.s. for all n, s ≥

1, and ψ a,b ( f, g ) = c (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ + c Lip( f ) (cid:107) g (cid:107) ∞ + c (cid:107) f (cid:107) ∞ Lip( g ) + c Lip( f ) Lip( g ) , where c , . . . , c ≤ Cab .It should be noted that processes satisfying Assumption 2.2 possess some hereditaryproperties. Speciﬁcally, if { Y n } is ( L v , ψ, C )-weakly dependent with the weak dependencecoeﬃcients { γ n,s } , then for any Lipschitz function h : R v → R w the sequence { h ( Y n,i ) : i ∈ N n } is ( L w , ψ, C )-weakly dependent with the same weak dependence coeﬃcients. Moreover,this type of weak dependence is preserved under some locally Lipschitz functions as shownin Proposition 2.1 below, which is an extension of Proposition 2.1. in Dedecker, Doukhan,and Lang (2007) to our settings. Proposition 2.1.

Suppose that { Y n } satisﬁes Assumption 2.2 and there exist L < ∞ and p > such that sup n,i ∈ N n E [ (cid:107) Y n,i (cid:107) p ∞ | C ] ≤ L a.s. Let h : R v → R w be such that (2.2) (cid:107) h ( x ) − h ( y ) (cid:107) ≤ η (cid:107) x − y (cid:107) (cid:0) (cid:107) x (cid:107) τ − + (cid:107) y (cid:107) τ − (cid:1) for some η > and τ ∈ [1 , p ) . Then { h ( Y n,i ) : i ∈ N n } is ( L w , ψ, C ) -weakly dependent withthe weak dependence coeﬃcients γ (cid:48) n,s = KM γ rn,s , where K is a constants depending on η , v , and L and r =  ( p − τ ) / ( p − , if c = 0 , ( p − τ ) / ( p + τ − , otherwise . Remark.

The boundedness of the conditional moments of (cid:107) Y n,i (cid:107) ∞ is required in order tomaintain Assumption 2.2. Once this condition is relaxed, it suﬃces to assume that thesemoments are a.s. ﬁnite. Introducing weighted networks is useful in several scenar-ios. First, as we have already mentioned it allows incorporating some additional random processes into the current framework. Second, assuming varying intensity of connectionsenables one to handle denser networks in the sense of the total number of links. Finally,some commonly used statistical models explicitly use weights and can be adapted to ourframework, e.g., the spatial Cliﬀ-Ord-type linear model in Kelejian and Prucha (2010).

Example 2.2.

For each n ≥

1, let u n be a n × W n be an n × n matrix which is a function of weights associated with a given network.Consider a linear model with disturbances following the next autoregressive process: ε n = λ ˜ W n ε n + u n , | λ | < , Typically the original weighting matrix is modiﬁed to ensure that the spectral radius of ˜ W n is bounded by 1. Under certain restrictions on the denseness of underlying networks, theprocess { ε n } is weakly dependent with ψ a,b ( f, g ) = a (cid:107) g (cid:107) ∞ Lip( f ) + b (cid:107) f (cid:107) ∞ Lip( g ) so thatthe model can be accommodated within the current framework.Assume that C n := ( I − λ ˜ W n ) − exists for each n ≥ µ := sup n,i ∈ N n E | u n,i | < ∞ .Then ε n = C n u n and, letting ε ( s ) n,i := (cid:80) j ∈ N n : d n ( i,j ) i ∈ N n , i.e, N n ( i ; s ) := { j ∈ N n : d n ( i, j ) < s } , and let N ∂n ( i ; s ) := N n ( i ; s + 1) \ N n ( i ; s ). In addition, we deﬁne the following aggregatemeasures of the network denseness:(2.3) δ n ( s ; k ) := n − (cid:88) i ∈ N n | N n ( i ; s + 1) | k , δ ∂n ( s ; k ) := n − (cid:88) i ∈ N n | N ∂n ( i ; s ) | k ,D n ( s ) := max i ∈ N n | N n ( i ; s + 1) | , and D ∂n ( s ) := max i ∈ N n | N ∂n ( i ; s ) | . It is straightforward to see that under Assumption 2.1, which restricts the minimumdistance between any two nodes of a network, the asymptotic results derived in KMS(2020) remain valid once we replace their measures of network denseness with those given Note that this deﬁnition of the open neighborhood of a node diﬀers from one commonly used in graphtheory. in (2.3) and redeﬁne H n ( s, m ) as follows:(2.4) H n ( s, m ) := (cid:26) ( i, j, k, l ) ∈ N n : j ∈ N n ( i ; , m + 1) , l ∈ N n ( k ; m + 1) , (cid:98) d n ( { i, j } , { k, l } ) (cid:99) = s (cid:27) . In the case of random networks, however, the measures of network denseness are alsorandom. Therefore, one needs a conditional version of the Law of Large Numbers inorder to be able to condition on the common shock C . Note that the other result arestated in the conditional form and can be directly applied to this case if we assume certainmeasurability conditions. Let D ( G n ) denote the distance matrix associated with G n , i.e.,[ D ( G n )] ij = d n ( i, j ). If D ( G n ) is C -measurable, then N n ( i ; s ) = (cid:80) j ∈ N n { [ D ( G n )] ij < s } isalso C -measurable as well as the quantities given in (2.3) and (2.4). We make the followingassumption: Assumption 2.3.

The distance matrix D ( G n ) is C -measurable for all n ≥ Deﬁnition 2.2.

Let

F ⊂ H and let f : Y × Ω → R ≥ be such that f ( y, · ) is F -measurablefor all y ∈ Y . A sequence of such functions { f n } is asymptotically negligible (a.n.), if foralmost all ω ∈ Ω, ess inf s ∈ S lim sup n →∞ f n ( s, ω ) = 0 . In particular, an array of random vectors { X n,i } is– F - asymptotically tight if max i P ( (cid:107) X n,i (cid:107) > y | F ) is a.n.– F - asymptotically uniformly integrable (u.i.) if max i E [ (cid:107) X n,i (cid:107) {(cid:107) X n,i (cid:107) > y } | F ] is a.n. Theorem 2.1 (Conditional Weak Law of Large Numbers) . Let { ( Y n , G n ) } be ( L v , ψ, C ) -weakly dependent satisfying Assumption 2.1, 2.2, and 2.3. Suppose that { Y n } is C -asympto-tically u.i. and n (cid:88) s ≥ δ ∂n ( s ; 1) γ n,s → a.s.Then (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) i ∈ N n ( Y n,i − E [ Y n,i | C ]) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , → a.s. The essential inﬁmum of an arbitrary family of random variables { Z α : α ∈ A} a random variable Z suchthat (a) Z ≤ Z α a.s. for all α ∈ A and (b) Z ≥ Z (cid:48) a.s. for any random variable Z (cid:48) satisfying Z (cid:48) ≤ Z α a.s.for all α ∈ A . In particular, there exists a sequence { α n } such that Z = inf n Z α n a.s. (see, e.g., Cohenand Elliott, 2015, Theorem 1.3.40). The essential supremum is deﬁned similarly and the following identityholds: ess sup α { Z α } = − ess inf α {− Z α } . For a random vector X and F ⊂ H we write (cid:107) X (cid:107) F ,p ≡ E [ (cid:107) X (cid:107) p | F ] /p . Remark.

Similarly to the unconditional case a suﬃcient condition for the C -asymptoticuniform integrability of { Y n } is the a.s. ﬁniteness of sup n,i ∈ N n E [ (cid:107) Y n,i (cid:107) p | C ] for some p > Y n := n − (cid:80) i ∈ N n Y n,i and Σ n := Var( √ n ¯ Y n | C ). Then the network HACestimator of Σ n ,(2.5) ˆΣ n = 1 n (cid:88) i,j ∈ N n κ (cid:18) d n ( i, j ) b n + 1 (cid:19) ( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) , where κ : ¯ R → [ − ,

1] is a kernel function satisfying: κ (0) = 1, κ ( z ) = κ ( − z ), and κ ( z ) = 0for | z | > b n is the lag truncation parameter, is consistent under the same set ofassumptions. Unfortunately, due to the irregularity of a network’s structure, this estimatoris not guaranteed to be positive semi-deﬁnite. However, once the minimal eigenvalue of Σ n is a.s. bounded from below or it converges to an a.s. positive deﬁnite matrix, a simple wayto ﬁx this issue is available. The details are given in Appendix B.

3. Conditional Bootstrap

In this section we present some general result regarding the conditional bootstrap. Thelatter is useful for an inference which is asymptotically valid for almost all ω ∈ Ω (or almostall realizations of the common shock). These results do not depend on the underlying datagenerating process. However, we use the present framework for convenience.Suppose that { ( Y n , G n ) } is a sequence of network dependent processes. For a given n ≥ θ n be a C -measurable parameter taking values in Θ ⊆ R w with w ≥ T n ( θ n ) := T n ( Y n , θ n ; ϑ n ) , where T n is a measurable, real-valued function and ϑ n is a C -measurable nuisance param-eter, denote a statistic used to conduct inference on θ n based on a realization of ( Y n , G n )conditionally on C .Let F n denote the conditional cdf of T n given C . The goal of this section is to providesuﬃcient conditions for the conditional ﬁrst-order consistency of resampling estimators of F n . Speciﬁcally, let G n := C ∨ σ ( Y n ) and let Y ∗ n be a pseudo-sample drawn using a realizationof Y n . Then the bootstrap counterpart of T n is T ∗ n := T m ( Y ∗ n , θ ∗ n ), where m is the size of Y ∗ n and θ ∗ n ≡ θ ∗ n ( Y n ) is an estimator of θ n . The conditional cdf F ∗ n of T ∗ n is used as anapproximation of F n . If the latter explicitly depends on the nuisance parameter ϑ n , thenone needs to provide its consistent estimator based on both Y n and Y ∗ n . A (regular) conditional cdf F F X of X ∈ R given F ⊂ H satisﬁes: (i) ∀ x ∈ R , F F X ( · , x ) is a version of P ( X ≤ x | F ), and (ii) ∀ ω ∈ Ω, F F X ( ω, · ) is a distribution function. We omit the subscript X or thesuperscript F whenever clear from the context. A typical way of showing the consistency of the bootstrap estimators is bounding theKolmogorov distance between the cdfs of T n and T ∗ n (see, e.g., Shao and Tu, 1995, Chap-ter 3). For random variables X and Y and sub- σ -ﬁelds F ⊂ G ⊂ H the conditional versionof the latter is deﬁned by d K ( X, Y | G , F ) := sup x ∈ R (cid:12)(cid:12) F G X ( · , x ) − F F Y ( · , x ) (cid:12)(cid:12) , where F G X and F F Y are the conditional cdfs of X and Y , respectively (when F = G wedenote this measure by d K ( X, Y | F )). In addition, we deﬁne the conditional convergencein probability and the almost sure convergence of conditional distributions. Deﬁnition 3.1.

Let

F ⊂ H be a sub- σ -ﬁeld and let Z be a F -measurable random vectorin R v with v ≥

1. A sequence of R v -valued random vectors Z n F− p −−→ Z a.s. if for any (cid:15) > P ( (cid:107) Z n − Z (cid:107) > (cid:15) | F ) → Deﬁnition 3.2.

Suppose that { X n } is a sequence of random vectors on (Ω , H , P ) and F ⊂ H . Let Q n be the conditional distribution of X n given F . We say that X n converges F -weakly to X having the conditional distribution Q if for almost all ω ∈ Ω the sequence { Q n ( ω, · ) } converges weakly to Q ( ω, · ). Remark. (a) Equivalently, the F -weak convergence can be deﬁned using the notion ofprobability kernels. So the limiting random vector X is an artiﬁcial construct which is usedto describe the limiting kernel. (b) A more general notion of the almost sure convergence ofconditional probability measures and some of its properties are presented in Berti, Pratelli,and Rigo (2006). (c) The notion of the F -weak convergence is stronger than the F -stableconvergence and the usual weak convergence. In particular, if X n → X F -weakly, then forany real-valued, bounded, continuous function f , E [ f ( X n ) | F ] → E [ f ( X ) | F ] a.s., whichimplies that X n converges to X F -stably and in distribution.Assume for a moment that C = {∅ , Ω } . Then if there exists a sequence of randomvariables { S n } such that d K ( T n , S n | C ) converges to 0 as n → ∞ and d K ( T ∗ n , S n | G n , C )converges to 0 a.s. (in probability), then the bootstrap estimator is ﬁrst-order strongly(weakly) consistent. Moreover, if S n converges weakly to a continuous limit, then theconditional quantiles of F ∗ n are a good approximation to those of F n . This typically happens Note that d K ( · , · | G , F ) is G -measurable because { Z x } , where Z x := (cid:12)(cid:12) F G X ( · , x ) − F F Y ( · , x ) (cid:12)(cid:12) , is a c´adl´agstochastic process). A (regular) conditional distribution Q F X of X ∈ R v given F ⊂ H satisﬁes: (i) ∀ B ∈ B ( R v ), Q F X ( · , B ) is aversion of P ( X ∈ B | F ) and (ii) ∀ ω ∈ Ω, Q F X ( ω, · ) is a probability measure on ( R , B ( R v )). We omit thesubscript X or the superscript F whenever clear from the context. Note that this implies convergence in probability due to the dominated convergence theorem. In addition,for an a.s. positive F -measurable random variable ν , P ( (cid:107) Z n − Z (cid:107) > ν | F ) → when the statistic T n is pivotal. However, in the case of a non-pivotal statistic, which isuseful when a consistent estimator of ϑ n is hard to obtain or the available estimatorshave poor ﬁnite sample properties, the cdfs of { T n } need not converge. In this case, theconvergence of the Kolmogorov distance between T ∗ n and T n to zero does not necessarilyimply that F n ( c ∗ n ( α )) → α as n → ∞ , where c ∗ n ( α ) is the conditional α -quantile of F ∗ n .Nevertheless, as shown in the next result, a suﬃcient condition for the latter to happen isthe continuity of the cdfs of { S n } . Theorem 3.1.

Suppose that for all n ≥ , S n is conditionally independent of Y n given C and the conditional cdf of S n given C is ( a.s. ) continuous. Then if (a) d K ( T n , S n | C ) → a.s. and (b) d K ( T ∗ n , S n | G n , C ) C− p −−→ a.s., d K ( T ∗ n , T n | G n , C ) C− p −−→ a.s. and ess sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α ) | C ) − α | → a.s. Remark. (a) Usually when C = {∅ , Ω } and the statistic T n is pivotal, we have S n = S ∞ ,which is the weak limit of T n . (b) A variant of this result can be found in Chernozhukovet al. (2013) in the context of Gaussian multiplier bootstrap. (c) Theorem 3.1 also impliesthat the conditional quantiles { c ∗ n ( α ) : α ∈ (0 , } approximate the unconditional quantilesof T n because, by the dominated convergence theorem,sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α )) − α | ≤ E ess sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α ) | C ) − α | → . Deﬁnition 3.3.

We say that F ∗ n is conditionally d K -consistent given C if the conclusion ofTheorem 3.1 holds.Typically it is not hard to show that condition (a) of Theorem 3.1 holds (for example,when the elements of Y ∗ n are conditionally i.i.d. given G n ). On contrary, establishing (b) maybe a diﬃcult task, especially when T n is a nonlinear transformation of Y n in the presence ofstochastic dependence between its elements as in the current framework. However, in thecase when the statistic T n converges C -weakly to S and the limiting kernel (i.e., the regularconditional cdf of S given C ) is continuous, Lemma C.4 implies that this convergence isequivalent to one with respect to the conditional Kolmogorov distance. In addition, byLemma C.3 the almost sure convergence of conditional distributions enjoys a number ofuseful properties associated with the usual weak convergence such as the continuous map-ping theorem, converging together lemma, and the Cram´er–Wold device. In this situationwe have the following simple corollary. Consider, for example, the case of the linearized statistic T (cid:48) n given in (3.1). It does not have a nondegen-erate weak limit when the sequence of parameters { θ n } is not convergent. Corollary 3.1.

Suppose that S is conditionally independent of { Y n } given C and the con-ditional cdf of S given C is ( a.s. ) continuous. Then if (a) T n → S C -weakly and (b) d K ( T ∗ n , S | G n , C ) C− p −−→ a.s., F ∗ n is conditionally d K -consistent given C . Next, we consider the case in which the statistic T n takes the following form: T n ( θ n ) = τ n (cid:16) φ (ˆ θ n ) − φ ( θ n ) (cid:17) , where φ : Θ → R is a continuously diﬀerentiable function, ˆ θ n is a consistent estimator of θ n (in the sense of Deﬁnition 3.1) and τ n is a normalizing coeﬃcient. In particular, the smoothfunction model (see, e.g., Lahiri, 2003, Section 4.2 and Hall, 1992, Section 2.4) falls intothis case. The resampling version of the statistic T n is T ∗ n = τ ∗ n (cid:16) φ (ˆ θ ∗ n ) − φ ( θ ∗ n ) (cid:17) , where θ ∗ n is a consistent estimator of θ n , which may diﬀer from ˆ θ n , and τ ∗ n is the bootstrapcounterpart of τ n . Let ξ n := τ n (ˆ θ n − θ n ) and ξ ∗ n := τ ∗ n (ˆ θ ∗ n − θ ∗ n ). Consider the linearizedstatistics(3.1) T (cid:48) n := ∇ φ ( θ n ) (cid:62) ζ n and T (cid:48)∗ n := ∇ φ ( θ ∗ n ) (cid:62) ξ ∗ n . The following result shows that it suﬃces to ﬁnd a “smooth” approximation S (cid:48) n of thelinearized statistics in order to apply Theorem 3.1 to this setup. In particular, the resultlargely depends on the asymptotic behavior of the conditional L´evy concentration functionof S (cid:48) n . For a random variable X , (cid:15) >

0, and a sub- σ -ﬁeld F ⊂ H the latter is given by Q ( (cid:15), X | F ) := sup x ∈ R (cid:0) F F X ( · , x + (cid:15) ) − F F X ( · , x − ) (cid:1) . Lemma 3.1.

Suppose that ˆ θ ∗ n − θ ∗ n C− p −−→ a.s., ξ ∗ n and ξ n are C -asymptotically tight and sup n (cid:107) θ n (cid:107) < ∞ a.s. Furthermore, assume that (a) d K ( T (cid:48) n , S (cid:48) n | C ) → a.s., (b) d K ( T (cid:48)∗ n , S (cid:48) n | G n , C ) C− p −−→ a.s. and (c) Q ( (cid:15), S (cid:48) n | C ) is a.n.Then w.p.1, d K ( T ∗ n , S (cid:48) n | G n , C ) C− p −−→ and d K ( T n , S (cid:48) n | C ) → . Consequently, the continuity of the conditional cdfs of { S (cid:48) n } ensures the bootstrap con-sistency in the sense of Deﬁnition 3.3. Theorem 3.2.

Suppose that the conditions of Lemma 3.1 hold and, in addition, { S n } satisfy the independence and continuity conditions of Theorem 3.1. Then F ∗ n is conditionally d K -consistent given C . Similarly to the general case, when ξ n converges C -weakly to some random vector ξ and the sequence of parameters { θ n } converges a.s. to a C -measurable random variable θ ,Lemma C.3 implies that T (cid:48) n converges C -weakly to ∇ φ ( θ ) (cid:62) ξ . In addition, if the conditionalcdf of the latter is (a.s.) continuous, it satisﬁes assumption (b) of Lemma 3.1. Corollary 3.2.

Suppose that ˆ θ ∗ n − θ ∗ n C− p −−→ a.s., ξ ∗ n is C -asymptotically tight and S (cid:48) := ∇ φ ( θ ) (cid:62) ξ satisﬁes the independence and continuity conditions of Corollary 3.1. Then if (a) ξ n → ξ C -weakly, (b) θ n → θ a.s. and (c) d K ( T (cid:48)∗ n , S (cid:48) | G n , C ) C− p −−→ a.s., F ∗ n is conditionally d K -consistent given C . Remark.

The assumption regarding convergence of the sequence of parameters { θ n } canbe relaxed. In the unconditional case it suﬃces to assume that sup n (cid:107) θ n (cid:107) < ∞ . Thenone needs to provide a uniform bound on | P ( ξ n ∈ A ) − P ( ξ ∈ A ) | , where A ranges over theclass of half-spaces for a network dependent process similar to that established in Bentkus(2003). The conditioning on C complicates the problem even more so it falls out of thescope of this paper.

4. Bootstrap of the Mean

Consider a sequence of network dependent processes { ( Y n , G n ) } satisfying Assumptions2.1, 2.2, and 2.3. As an application of the results given in the preceding section, weconsider the mean of a Y n , µ n ≡ E [ Y n,i | C ] which may vary with n but not across i ∈ N n . The parameter of interest µ n is estimated using the sample mean ¯ Y n which is a consistentestimator of µ n under the assumptions of Theorem 2.1. In this section we provide a numberof resampling based methods for constructing the asymptotically valid conﬁdence sets for µ n . In addition, we establish consistency of a restricted version of the smooth function model in which we are interested in φ ( µ n ) for a continuously diﬀerential function φ : R v → R .When the elements of Y n have the same marginal conditional distributions given C , we mayconsider φ ( E [ f ( Y n, ) | C ]), where f : R v → R w is a locally Lipschitz function satisfying(2.2) and the domain of φ is R w in this case. Since the process { f ( Y n,i ) : i ∈ N n } is It is possible to extend the results of this paper to the case of heterogeneous means as in Gon¸calves andWhite (2002). However, such a setup makes it diﬃcult to isolate the eﬀect of the structure of underlyingnetworks on the consistency of the proposed bootstrap methods. ( L w , ψ, C )-weakly dependent by Proposition 2.1, without loss of generality we examine theﬁrst version. In addition, we provide consistent positive semi-deﬁnite estimators of Σ n .The corresponding test statistics are given by T ,n ( µ n ) = √ n (cid:107) ¯ Y n − µ n (cid:107) , and T ,n ( µ n ) = √ n (cid:0) φ ( ¯ Y n ) − φ ( µ n ) (cid:1) , (4.1)where (cid:107) · (cid:107) is the Euclidean norm on R v . Their conditional distributions given C are denotedby F ,n and F ,n , respectively, and the bootstrap approximations of these distributions aredenoted by F ∗ ,n and F ∗ ,n . The conﬁdence sets for µ n are obtained by test inversion, i.e CS n, − α := (cid:8) µ ∈ R v : T ,n ( µ ) ≤ c ∗ n, − α (cid:9) , where c ∗ n,α ( ω ) := inf { x : F ∗ ,n ( ω, x ) ≥ α } is the conditional α -quantile of F ∗ ,n . In practice,if the exact distribution of T ∗ j,n , j = 1 , First, we suggest a variant of the block bootstrap, whichis extensively studied in the time-series and spatial literature. Speciﬁcally, we choosethe maximal block radius s n > n overlapping blocks { B n, , . . . , B n,n } with B n,k := N n ( k ; s n + 1). That is B n,k is an ( s n + 1) open neighborhood of the node k . Thenwe randomly select K n := (cid:98) n/δ n ( s n ) (cid:99) blocks { B ∗ n, , . . . , B ∗ n,K n } with replacement (note that δ n ( s n ) is the average block size) which yields a bootstrap sample Y ∗ n = (cid:8) Y n,B ∗ n,k : 1 ≤ k ≤ K n (cid:9) . Formally, let { u , . . . , u K n } be i.i.d. U { , n } random variables deﬁned on (Ω , H , P ) andindependent of G n . Then the k -th resampled block is deﬁned as B ∗ n,k = B n,u k and, therefore,for 1 ≤ k ≤ K n and 1 ≤ l ≤ n , P (cid:0) B ∗ n,k = B n,l | G n (cid:1) = n − a.s. For the ease of expositionwe assume that n/δ n ( s n ) is an integer.The size of the bootstrap sample L n := (cid:80) K n k =1 (cid:12)(cid:12) B ∗ n,k (cid:12)(cid:12) is random conditional on the dataand depends on the distribution of | N n ( · ; s n ) | given the network G n . However, on averageit is expected to be close to n , (in fact, the conditional expectation of L n given C is exactly n ). Also in the time series case this approach reduces to a variant of the moving blocksbootstrap with unequally sized blocks such that blocks located near the endpoints havesmaller size.Let Z ∗ n,k := (cid:80) j ∈ B ∗ n,k Y n,j and let ˜ Y ∗ n := n − (cid:80) K n k =1 Z ∗ n,k be the quasi-average of the boot-strap sample Y ∗ n , which replaces the sample average in the bootstrap versions of T ,n and T ,n . We could also consider the true average of a pseudo-sample, i.e., ¯ Y ∗ n := L − n (cid:80) K n k =1 Z ∗ n,k . However, L n is not independent of the blocks sums and, as mentioned before, its distribu-tion depends on the underlying network topology. As a result, it is relatively diﬃcult toﬁnd a “smooth” approximation of the distribution of √ L n ¯ Y ∗ n which guarantees the ﬁrst-order consistency of the bootstrap (in particular, the suggested resampling scheme may notbe appropriate in this case). In addition, since the conditional expectation of ˜ Y n given G n diﬀers from the sample average, we replace the true parameter µ n with µ ∗ n := E [ ˜ Y ∗ n | G n ].As indicated in Lahiri (1992) in the time-series context, replacing µ n with ¯ Y n introduces anadditional bias which does not allow for second-order improvements over the normal ap-proximation (see also Lahiri, 2003, Section 2.7.1). The BB counterparts of the test statisticsin (4.1) are given by T ∗ ,n = √ n (cid:107) ˜ Y ∗ n − µ ∗ n (cid:107) , and T ∗ ,n = √ n (cid:0) φ ( ˜ Y ∗ n ) − φ ( µ ∗ n ) (cid:1) . The conditional variance of the scaled sample mean Σ n can be estimated using the boot-strap version Σ ∗ n ≡ Var( √ n ˜ Y ∗ n | G n ). Since { Z ∗ n, , . . . , Z ∗ n,K n } are conditionally independentgiven G n , Σ ∗ n = 1 δ n ( s n ) (cid:32) n (cid:88) i ∈ N n Z n,i Z (cid:62) n,i − ¯ Z n ¯ Z (cid:62) n (cid:33) a.s. , where Z n,i := (cid:80) j ∈ B n,j Y n,j and ¯ Z n := n − (cid:80) i ∈ N n Z n,i . By construction the matrix Σ ∗ n ispositive semideﬁnite and its form is similar to the network HAC estimator (2.5). To seethis let(4.2) ω n ( i, j ) := | N n ( i ; s n + 1) ∩ N n ( j ; s n + 1) | δ n ( s n )(when i = j we denote this quantity by ω n ( i )). ThenΣ ∗ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j )( Y n,i − µ n )( Y n,j − µ n ) (cid:62) + R n a.s. , where E [ (cid:107) R n (cid:107) F | C ] → µ n = 0 a.s., the remainder term R n = 0 a.s. Unlike a typical kernel, theweighting functions ω n ( · , · ) depends on the network topology and it is not bounded by1. However, for ﬁxed i ∈ N n it is decreasing in the distance between i and j . Let ˜ ω :=sup n max i (cid:54) = j ω n ( i, j ), ˜ µ p := sup n,i ∈ N n (cid:107) Y n,i (cid:107) C ,p for p >

0, and(4.3) ∆ n ( s ; k ) := 1 n (cid:88) i ∈ N n || N n ( i ; s + 1) | − δ n ( s ) | k , which is the k -th absolute central moment of the sizes of the ( s + 1)-neighborhoods. Thefollowing assumptions provide suﬃcient conditions for the consistency of Σ ∗ n . Assumption 4.1.

The sequence { ( G n , s n ) } is such that w.p.1 ˜ ω < ∞ and(a) ∆ n ( s n ; 2) /δ n ( s n ) + D n ( s n ) / (cid:112) δ n ( s n ) n → i ∈ N n (cid:12)(cid:12)(cid:12)(cid:80) j ∈ B n,i ( ω n ( j ) − (cid:12)(cid:12)(cid:12) / √ n → n − (cid:80) i ∈ N n (cid:80) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | γ n,s → s ≥ ∗ n requires a certain degree of homogeneity of the resampled blocks whichis characterized by various moments of the weights { ω n ( i, j ) : i, j ∈ N n } . For example,condition ( a ) requires that the sample variance of {| B n,i |} increases at a lower rate thanthe average block size. It also guarantees that µ ∗ n is a consistent estimator of the mean µ n and that for large samples the size of a pseudo-sample, L n is close to n . In fact, E [ | L n /n − | | C ] → E [ | L n /n − | | C ] = 1 n E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K n (cid:88) k =1 (cid:0)(cid:12)(cid:12) B ∗ n,k (cid:12)(cid:12) − δ n ( s n ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | C (cid:35) ≤ n K n (cid:88) k =1 ∆ n ( s n ; 1) ≤ (cid:112) ∆ n ( s n ; 2) δ n ( s n ) a.s.This condition is clearly satisﬁed in the time series context when s n = o ( √ n ) (although,it has been shown that the consistency of the moving block bootstrap in this case holdsfor s n = o ( n ) (see, e.g., Calhoun, 2018)). However, it does not hold for unweighted “star”networks and s n ≡ n (1; 2) ≥ [∆ n (1; 1)] → δ n (1) → n → ∞ . Inpractice, one can compute ∆ n ( s n ; 2) for a given graph to see whether this quantity is smallrelative to the average block size.Condition (c) ensures that all the non-zero autocovariances are estimated consistently.It is similar to an assumption on kernel functions used in HAC estimation, that is in thelimit the value of a kernel at each s must converge to 1. In addition, if ˜ γ s > s ≥

1, then the parameter s n must go to inﬁnity for this condition tohold. Assumption 4.2.

There exists r > µ r < ∞ and(a) lim sup n →∞ (cid:80) s ≥ δ ∂n ( s ) γ − r n,s < ∞ ,(b) n − (cid:80) s ≥ | H n ( s, s n + 1) | γ − r n,s → s n (see KMS, 2020, Section 4.1). Also condition (a)implies that the elements of the true variance Σ n do not diverge to ±∞ . To see this note that for 1 ≤ k, l ≤ v and some constant C > | [Σ n ] kl | ≤ C (˜ µ r ∨ (cid:32) (cid:88) s ≥ δ ∂n ( s ) γ − r n,s (cid:33) a.s.Therefore, lim sup n ≥ | [Σ n ] kl | < ∞ a.s. Proposition 4.1.

Suppose that Assumptions 4.1 and 4.2 hold. Then E [ (cid:107) Σ ∗ n − Σ n (cid:107) F | C ] → a.s. The result of Proposition 4.1 implies that Σ ∗ n is a consistent estimator of Σ n . Therefore,assuming that Σ n → Σ a.s. and √ n ( ¯ Y n − µ n ) converges C -weakly to a conditionally normalrandom vector with variance Σ, we may use Corollaries 3.1 and 3.2 to establish the consis-tency of the bootstrap distributions. For example, one may employ Theorem 3.2 in KMS(2020) together with the Cram´er–Wold device and Lemma C.4. Assumption 4.3. Σ n converges a.s. to a C -measurable, positive deﬁnite matrix Σ, and √ n ( ¯ Y n − µ n ) → Σ / η C -weakly , where η ∼ N (0 , I v ) independent of C . In addition, we introduce the local versions of some measures of the network denseness.Speciﬁcally, for s, m ≥ δ ∂loc,n ( s, m ) := max i ∈ N n (cid:88) j ∈ N n ( i ; m ) (cid:12)(cid:12) N ∂n ( j ; s ) ∩ N n ( i ; m ) (cid:12)(cid:12) | N n ( i ; m ) | and h loc,n ( s, m ) := max i ∈ N n | H n ( s, ∞ ) ∩ N n ( i ; m ) || N n ( i ; m ) | . These measures are constructed in a way such that for any m ≥ δ ∂loc,n (0 , m ) = h loc,n (0 , m ) =1. Also note that h loc,n ( s, m ) ≤ δ ∂loc,n ( s, m ). Assumption 4.4.

There exists p > µ p < ∞ and( δ n ( s n ) /n ) / (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s + (cid:0) δ / n ( s n ) /n (cid:1) / (cid:88) s ≥ h loc,n ( s, s n ) γ − p n,s → . Assumption 4.3 is made merely for ease of exposition. In view of Theorem 3.1 it can be omitted at theexpense of establishing additional Berry-Essen type bounds. When the following summability condition holds:lim sup n →∞ (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s < ∞ a.s. , Assumption 4.3 reduces to δ / n ( s n ) /n → n ≥ { B n,k } have the same size l n < n , it suﬃces to assume that the weak dependence coeﬃcientsraised to the power 1 − /p are a.s. summable and l n = o ( n / ). Note that this assumptionalso explicitly requires K n → ∞ as n → ∞ . Proposition 4.2.

Suppose that Assumptions 4.1-4.4 hold. Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistent given C . The dependent wild bootstrap for time-series was introduced in Shao (2010). This method approximates the ﬁnite-sample distribu-tion of T n by mimicking the autocovariance structure of the underlying sample. In particu-lar, adapting to our framework, assume that C = {∅ , Ω } and let G n be an unweighted “line”network. Consider an n -dimensional, zero mean random vector W n deﬁned on (Ω , H , P )and independent of Y n such that Var( W n,i ) = 1 and Cov( W n,i , W n,j ) = κ ( d n ( i, j ) / ( s n + 1)),where κ ( · ) is a positive-deﬁnite kernel function and s n is a bandwidth parameter. TheDWB pseudo-sample Y ∗ n is deﬁned as follows: Y ∗ n,i = ¯ Y n + ( Y n,i − ¯ Y n ) W n,i , i ∈ N n . Let ¯ Y ∗ n := n − (cid:80) i ∈ N n Y ∗ n,i . By construction, E [ ¯ Y ∗ n | G n ] = ¯ Y n so that in contrast to theblock bootstrap, the statistic √ n ( ¯ Y ∗ n − ¯ Y n ) is unbiased given G n . In addition, noticing that κ (0) = 1, the conditional variance of the scaled bootstrap mean given G n isΣ ∗ n = 1 n (cid:88) i,j ∈ N n Cov( W n,i , W n,j )( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) = 1 n (cid:88) i,j ∈ N n κ (cid:18) d n ( i, j ) s n + 1 (cid:19) ( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) , which is a version of the network HAC estimator (2.5). Then under certain regularityconditions the DWB is ﬁrst-order consistent for smooth functions of the mean.For general graphs, however, positive deﬁniteness of the kernel function κ does not implythat the matrix [ κ ( d n ( i, j ) / ( s n + 1))] i,j ∈ N n is positive semi-deﬁnite (see KMS, 2020, Section4.1). Therefore, in general, we cannot guarantee the existence of a random vector withthe required covariance structure. A simple way to overcome this issue is to rely on the topology of a given network. Consider the matrix Ω n = [ ω n ( i, j )] i,j ∈ N n , where ω n is deﬁnedin (4.2). Claim 4.1. Ω n is positive semi-deﬁnite.Proof. Let c ∈ R n and ξ i := (cid:80) j ∈ N n ( i ; s n +1) c j . Then, since ( j, k ) ∈ N n ( i ; s n + 1) if and onlyif i ∈ N n ( j ; s n + 1) ∩ N n ( k ; s n + 1), (cid:88) i ∈ N n ξ i = (cid:88) i ∈ N n (cid:88) j,k ∈ N n ( i ; s n +1) c j c k = (cid:88) i,j ∈ N n c i c j ω n ( i, j ) δ n ( s n ) , Therefore, c (cid:62) Ω n c = (cid:88) i,j ∈ N n c i c j ω n ( i, j ) = (cid:88) i ∈ N n ξ i /δ n ( s n ) ≥ . (cid:4) Consequently, we consider a random vector W n satisfying the following assumption. Assumption 4.5. W n is conditionally independent of Y n given C with E [ W n | C ] = 0 a.sand E [ W n W (cid:62) n | C ] = Ω n a.s.Under Assumption 4.5 the bootstrap variance estimator given by(4.4) Σ ∗ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j )( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) a.s.is positive semi-deﬁnite. We impose the next conditions on the sequence of networks, whichin combination with Assumption 4.2, ensure the consistency of Σ ∗ n . Assumption 4.6.

The sequence { ( G n , s n ) } is such that w.p.1 ˜ ω < ∞ and(a) ∆ n ( s n ; 1) /δ n ( s n ) + D n ( s n ) /n → n − (cid:80) i ∈ N n (cid:80) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | γ n,s → s ≥ n ( s n ; 1) ≤ (cid:112) ∆ n ( s n ; 2) and δ n ( s n ) ≤ n . Therefore, the DWB estimator (4.4) is likely to be consistentfor a wider class of networks. As in the case of the block bootstrap we assume that thetrue variance Σ n converges a.s. to a C -measurable matrix Σ. Proposition 4.3.

Suppose that Assumptions 4.6 and 4.2 hold. Then E [ (cid:107) Σ ∗ n − Σ n (cid:107) F | C ] → a.s. First, we consider the Gaussian case. That is, we take W n = Ω / n ζ n , where ζ n is thestandard normal random vector in R n independent of G n . From the practical perspectiveit is a convenient choice, especially when n is large because a sample from a multivariate normal distribution can be easily generated. Moreover, eﬃcient algorithms for ﬁnding thesquare root of positive semideﬁnite matrices are available. We refer to Higham (2008)for details. As noted in Shao (2010), although the DWB sample with Gaussian weightsmay not match non-zero higher-order cumulants of the original process, it is diﬃcult tochoose the joint distribution of W n that ﬁts those cumulants, and performance of the DWBprimarily depends on the choice of the truncation parameter s n .In this case, conditionally on G n , the statistic √ n ( ¯ Y ∗ n − ¯ Y n ) = 1 √ n (cid:88) i ∈ N n W n,i ( Y n,i − ¯ Y n )is also normal with zero mean and variance given in (4.4). Therefore, the conditionaldistribution of the DWB counterpart of the test statistic T ,n , T ∗ ,n = √ n (cid:13)(cid:13) ¯ Y n ∗ − ¯ Y n (cid:13)(cid:13) given G n is known and is the same as the conditional distribution of the asymptotic Gauss-ian approximation (cid:107) Σ ∗ / n η (cid:107) , where η is as v -dimensional standard normal random vectorindependent of G n , and the latter converges C -weakly to (cid:107) Σ / η (cid:107) by Lemma C.3. A moreinteresting case, however, arises when considering the second test statistic T ,n because fornonlinear transformations the conditional distribution of its bootstrap analog, T ∗ ,n = √ n (cid:0) φ ( ¯ Y n ∗ ) − φ ( ¯ Y n ) (cid:1) , is typically unavailable. Then in the Gaussian case the DWB is consistent without anyfurther restriction on the topology of the sequence of networks { G n } . We only need toassume that √ n ( ¯ Y n − µ n ) converges C -weakly to a conditionally normal random vector andthe asymptotic variance of T ,n is a.s. positive. Proposition 4.4.

Suppose that W n is Gaussian, Assumptions 4.5, 4.6, 4.2, and 4.3 hold.Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistentgiven C . Given another choice of W n , the process { ξ n,i := W n,i ( Y n,i − ¯ Y n ) : i ∈ N n } is s n -dependentconditionally on G n , i.e., ξ n,i and ξ n,j are conditionally independent given G n whenever j / ∈ B n,i := N n ( i ; s n + 1). Consequently, in addition to the assumptions of Proposition 4.4, weneed to control the behavior of the third conditional moments of W n and the neighborhoods { B n,i } such that the bootstrap distributions F ∗ ,n and F ∗ ,n in this case approach ones underthe Gaussian weights as n → ∞ . Proposition 4.5.

Suppose that Assumptions 4.5, 4.6, 4.2, and 4.3 hold, and (4.5) 1 n / (cid:88) i ∈ N n (cid:88) j ∈ B n,i (cid:88) k ∈ B n,i ∪ B n,j (cid:89) l ∈{ i,j,k } (cid:107) W n,l (cid:107) C , → a.s.Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistentgiven C . Remark.

The convergence condition in (4.5) is quite strong. In particular, in a sim-ple case when the neighborhoods { B n,i } have the same size l n < n for all n ≥ n,i ∈ N n (cid:107) W n,i (cid:107) C , < ∞ a.s., it requires l n = o ( n / ). Therefore, it is of high interest toﬁnd a better way to handle network dependent processes under m -dependence.

5. Conclusion

Nonparametric bootstrapping for time series and spatial processes has been extensivelystudied in the past decades. Thus, various resampling methods are now available for sta-tistical analysis of dependent data in these cases. However, the lack of regular structurein networks renders the use of these techniques for bootstrap-based inference in the caseof network dependent processes impracticable. In this paper we proposed a block-basedmethod and a variant of the dependent wild bootstrap suitable for the latter processessatisfying the conditional version of Doukhan and Louhichi (1999)’s ψ -weak dependencecondition. We established the ﬁrst-order validity of these methods to construct conﬁdencesets for the mean of a network dependent process. In addition, we showed their consistencyunder the smooth function model conditionally on a common shock of a general form. Fi-nally, the corresponding bootstrap variance estimators can be used for asymptotic inferenceinstead of the network HAC estimator, which is not necessarily positive semi-deﬁnite.As for the future directions, having a continuity theorem and other related results similarto ones established in Belyaev and Sj¨ostedt-de Luna (2000) but under convergence in condi-tional probability would signiﬁcantly weaken the bootstrap consistency conditions derivedin this paper. In addition, an extension of these methods for bootstrapping M -estimatorsand empirical processes is of great importance for applied research. References

Athreya, K. B., Lahiri, S. N., 2006. Measure Theory and Probability Theory (SpringerTexts in Statistics). Springer-Verlag, Berlin, Heidelberg.Belyaev, Y., Sj¨ostedt-de Luna, S., 2000. Weakly approaching sequences of random distri-butions. Journal of Applied Probability 37 (3), 807—-822.Bentkus, V., 2003. On the dependence of the Berry-Esseen bound on dimension. Journalof Statistical Planning and Inference 113, 385––402.Berti, P., Pratelli, L., Rigo, P., 2006. Almost sure weak convergence of random probabilitymeasures. Stochastics 78 (2), 91–97.B¨uhlmann, P. L., 1993. The blockwise bootstrap in time series and empirical processes.Ph.D. thesis, Swiss Federal Institute of Technology Z¨urich.Calhoun, G., 2018. Block bootstrap consistency under weak assumptions. EconometricTheory.Carlstein, E., 1986. The use of subseries values for estimating the variance of a generalstatistic from a stationary sequence. The Annals of Statistics 14 (3), 1171–1179.Chernozhukov, V., Chetverikov, D., Kato, K., 2013. Gaussian approximations and mul-tiplier bootstrap for maxima of sums of high-dimensional random vectors. Annals ofStatistics 41 (6), 2786–2819.Cohen, S., Elliott, R. J., 2015. Stochastic Calculus and Applications, 2nd Edition. Proba-bility and Its Applications. Birkh¨auser Basel.Comets, F., Janˇzura, M., 1998. A central limit theorem for conditionally centered randomﬁelds with an application to Markov ﬁelds. Journal of Applied Probability 35, 608–621.Conley, T. G., 1999. GMM estimation with cross-sectional dependence. Journal of Econo-metrics 92, 1–45.Crimaldi, I., 2009. An almost sure conditional convergence result and an application to ageneralized P´olya urn. International Mathematical Forum 4 (23), 1139–1156.Dedecker, J., Doukhan, P., Lang, G., 2007. Weak Dependence: With Examples and Appli-cations. Lecture Notes in Statistics. Springer, New York.Doukhan, P., Louhichi, S., 1999. A new weak dependence condition and applications tomoment inequalities. Stochastic Processes and their Applications 84 (2), 313–342.Durrett, R., 2010. Probability: Theory and Examples, 4th Edition. Cambridge UniversityPress.Embrechts, P., Hofert, M., 2013. A note on generalized inverses. Mathematical Methods ofOperations Research 77 (3), 423–432.Garcia-Soidan, P., Menezes, R., Rubinos, O., 2014. Bootstrap approaches for spatial data.Stochastic Environmental Research and Risk Assessment 28 (5), 1207–1219. Gon¸calves, S., Politis, D., 2011. Discussion: Bootstrap methods for dependent data: Areview. Journal of the Korean Statistical Society 40 (4), 383–386.Gon¸calves, S., White, H., 2002. The bootstrap of the mean for dependent heterogeneousarrays. Econometric Theory 18 (6), 1367–1384.Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer Series in Statistics.Springer-Verlag, New York, Berlin, Paris.Higham, N., 1988. Computing a nearest symmetric positive semideﬁnite matrix. LinearAlgebra and its Applications 103, 103–118.Higham, N., 2002. Computing the nearest correlation matrix - a problem from ﬁnance.IMA Journal of Numerical Analysis 22 (3), 329–343.Higham, N. J., 2008. Functions of Matrices: Theory and Computation. Society for Indus-trial and Applied Mathematics, Philadelphia, PA, USA.Jenish, N., Prucha, I. R., 2009. Central limit theorems and uniform laws of large numbersfor arrays of random ﬁelds. J. Econometrics 150 (1), 86–98.Kallenberg, O., 2002. Foundations of Modern Probability, 2nd Edition. Probability and itsApplications (New York). Springer-Verlag, New York.Kelejian, H. H., Prucha, I. R., 2007. HAC estimation in a spatial framework. Journal ofEconometrics, 131–154.Kelejian, H. H., Prucha, I. R., 2010. Speciﬁcation and estimation of spatial autoregressivemodels with autoregressive and heteroskedastic disturbances. Journal of Econometrics157 (1), 53–67, nonlinear and Nonparametric Methods in Econometrics.K¨unsch, H. R., 1989. The jackknife and the bootstrap for general stationary observations.The Annals of Statistics 17, 1217–1241.Kojevnikov, D., Marmer, V., Song, K., 2020. Limit theorems for network dependent randomvariables. Journal of Econometrics.URL https://doi.org/10.1016/j.jeconom.2020.05.019

Lahiri, S. N., 1992. Edgeworth correction by ’moving block’ bootstrap for stationary andnonstationary data. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap.John Wiley & Sons, New York; Chichester, pp. 183–214.Lahiri, S. N., 2003. Resampling Methods for Dependent Data. Springer Series in Statistics.Liu, R. Y., Singh, K., 1992. Moving blocks jackknife and bootstrap capture weak depen-dence. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap. John Wiley& Sons, New York; Chichester, pp. 225–248.Matyas, L., 1999. Generalized Method of Moments Estimation. Cambridge UniversityPress, Cambridge.Paparoditis, E., Politis, D. N., 2001. Tapered block bootstrap. Biometrika 88 (4), 1105–1119. Politis, D. N., Romano, J. P., 1992. A circular block-resampling procedure for stationarydata. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap. John Wiley& Sons, New York; Chichester, pp. 263–270.Politis, D. N., Romano, J. P., 1994. The stationary bootstrap. Journal of the AmericanStatistical Association 89 (428), 1303–1313.Politis, D. N., Romano, J. P., Wolf, M., 1999. Subsampling. Springer.Rhee, W., Talagrand, M., 1986. Uniform bound in the central limit theorem for Banachspace valued dependent random variables. Journal of Multivariate Analysis 20 (2), 303–320.R¨ollin, A., 2013. Stein’s method in high dimensions with applications. Ann. Inst. H.Poincar´e Probab. Statist. 49 (2), 529–549.Sengupta, S., Shao, X., Wang, Y., 2015. The dependent random weighting. Journal of TimeSeries Analysis 36 (3), 315–326.Shao, J., Tu, D., 1995. The Jackknife and Bootstrap. Springer-Verlag, Berlin; New York.Shao, X., 2010. The dependent wild bootstrap. Journal of the American Statistical Asso-ciation 105 (489), 218–235.Shiryaev, A. N., 2016. Probability-1, 3rd Edition. Graduate Texts in Mathematics.Springer-Verlag New York.Talagrand, M., 2011. Mean Field Models for Spin Glasses, Volume I: Basic Examples.Vol. 54 of A Series of Modern Surveys in Mathematics. Springer-Verlag.Zhang, F., 2011. Matrix Theory: Basic Results and Techniques, 2nd Edition. Universitext.Springer-Verlag New York. Appendix A. Proofs of the Main Results

In the following let ϕ K with K ∈ R + denote the element-wise censoring function, i.e.,for an indexed family of real numbers x ≡ ( x i ) i ∈ I ,[ ϕ K ( x )] i := ( − K ) ∨ ( K ∧ x i ) , i ∈ I. Proof of Proporsition 2.1 . Fix κ ≥ ξ := ( f ◦ h )( Z n,A ) and ζ := ( g ◦ h )( Z n,B ),where f, g ∈ L w and ( A, B ) ∈ P n ( a, b ; s ). Deﬁne the censored versions ξ κ := ( f ◦ h ◦ ϕ κ )( Z n,A ) and ζ κ := ( g ◦ h ◦ ϕ κ )( Z n,B ) . Then | Cov( ξ, ζ | C ) | ≤ | Cov( ξ − ξ κ , ζ − ζ κ | C ) | + | Cov( ξ κ , ζ κ | C ) |≤ (cid:107) f (cid:107) ∞ E [ | ζ − ζ κ | | C ] + 2 (cid:107) g (cid:107) ∞ E [ | ξ − ξ κ | | C ]+ | Cov( ξ κ , ζ κ | C ) | a.s.First, Lip( f ◦ h ◦ ϕ κ ) ≤ ηκ τ − Lip( f ) and Lip( g ◦ h ◦ ϕ κ ) ≤ ηκ τ − Lip( g ). Therefore, | Cov( ξ κ , ζ κ | C ) | ≤ (cid:0) c (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ + 2 ηκ τ − { c Lip( f ) (cid:107) g (cid:107) ∞ + c (cid:107) f (cid:107) ∞ Lip( g ) } + (2 ηκ τ − ) c Lip( f ) Lip( g ) (cid:1) γ n,s a.s.(A.1)Second, E [ | ξ − ξ κ | | C ] ≤ Lip( f ) (cid:88) i ∈ A E [ (cid:107) h ( Z n,i ) − ( h ◦ ϕ κ )( Z n,i ) (cid:107) | C ] ≤ C v Lip( f ) (cid:88) i ∈ A E [ (cid:107) Z n,i (cid:107) τ ∞ {(cid:107) Z n,i (cid:107) ∞ > κ } | C ] ≤ C v Lip( f ) aLκ τ − p a.s. , (A.2)where C v > v and η . Similarly, E [ | ζ − ζ κ | | C ] ≤ C v Lip( g ) bLκ τ − p a.s.(A.3)Since inequalities (A.1)-(A.3) hold for all κ ≥ κ on { κ ∈ [1 , ∞ ) } . The result follows by setting κ = ( γ n,s ∧ / (1 − p ) , if c = 0 and κ =( γ n,s ∧ / (2 − p − τ ) , otherwise, and, noticing that Cov( ξ, ζ | C ) = 0 a.s. on { γ n,s = 0 } . (cid:4) Proof of Theorem 2.1 . First, it suﬃces to show that (cid:13)(cid:13) c (cid:62) ( ¯ Y n − E [ ¯ Y n | C ]) (cid:13)(cid:13) C , → for any c ∈ R v with (cid:107) c (cid:107) = 1. Then the proof is similar to one given in the unconditionalcase. Speciﬁcally, for k >

0, let ξ ( k ) n,i := ϕ k ( c (cid:62) Y n,i ) and ζ ( k ) n,i := c (cid:62) Y n,i − ξ ( k ) n,i so that (cid:13)(cid:13) c (cid:62) ( ¯ Y n − E [ ¯ Y n | C ]) (cid:13)(cid:13) C , ≤ i ∈ N n (cid:13)(cid:13)(cid:13) ζ ( k ) n,i (cid:13)(cid:13)(cid:13) C , + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − (cid:88) i ∈ N n (cid:16) ξ ( k ) n,i − E [ ξ ( k ) n,i | C ] (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , a.s.The result then follows from the deﬁnition of the essential inﬁmum and the followinginequalities: (cid:13)(cid:13)(cid:13) ζ ( k ) n,i (cid:13)(cid:13)(cid:13) C , ≤ E [ (cid:107) Y n,i (cid:107) {(cid:107) Y n,i (cid:107) > k } | C ] a.s.and, since ψ , ( ϕ k , ϕ k ) ≤ Ck , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) i ∈ N n (cid:16) ξ ( k ) n,i − E [ ξ ( k ) n,i | C ] (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , ≤ √ nk (cid:32) C (cid:88) s ≥ δ ∂n ( s ; 1) θ n,s (cid:33) / a.s. (cid:4) Proof of Theorem 3.1 . The ﬁrst assertion follows trivially from the triangle inequality.Consider the second assertion. First, note that for a sub- σ -algebra F ⊂ H and randomvariables

X, Y and F -measurable random variable Z , P ( X ≤ Z | F ) = F F X ( · , Z ) a.s. and P ( Y ≤ Z | F ) = F F Y ( · , Z ) a.s. (see, e.g., Kallenberg, 2002, Theorem 5.4). Therefore, | P ( X ≤ Z | F ) − P ( Y ≤ Z | F ) | ≤ d K ( X, Y | F ) a.s.In addition, if F = σ ( A∪B ), where A and B are sub- σ -algebras of H , and Y is conditionallyindependent of A given B , then d K ( X, Y | F ) = d K ( X, Y | F , B ) a.s.Let c n ( α ) denote the conditional α -quantile of S n given C . Fix η > α ± η ∈ (0 ,

1) and let ∆ n ≡ d K ( T ∗ n , S n | G n , C ). Then, using the properties of generalized inverses(see, e.g., Embrechts and Hofert, 2013, Proposition 1) and the conditional independence of S n and Y n given C , we get P ( S n ≤ c ∗ n ( α + η ) | G n ) ≥ P ( T ∗ n ≤ c ∗ n ( α + η ) | G n ) − η ≥ α = P ( S n ≤ c n ( α ) | G n ) a.s. on { ∆ n ≤ η } and P ( T ∗ n ≤ c n ( α + 2 η ) | G n ) ≥ P ( S n ≤ c n ( α + 2 η ) | G n ) − η = α + η ≥ P ( T ∗ n ≤ c ∗ n ( α ) | G n ) a.s. on { ∆ n ≤ η } . Therefore, P ( c ∗ n ( α ) ≥ c n ( α − η ) | C ) ≥ P ( c ∗ n ( α ) ≥ c n ( α − η ) , ∆ n ≤ η | C )= P (∆ n ≤ η | C ) a.s.and P ( c ∗ n ( α ) ≤ c n ( α + 2 η ) | C ) ≥ P ( c ∗ n ( α ) ≤ c n ( α + 2 η ) , ∆ n ≤ η | C )= P (∆ n ≤ η | C ) a.s.Using the last two inequalities we ﬁnd that P ( c n ( α ) ∧ c ∗ n ( α ) < T n ≤ c n ( α ) ∨ c ∗ n ( α ) | C ) ≤ P ( c n ( α − η ) < T n ≤ c n ( α + 2 η ) | C )+ P ( c n ( α − η ) > c ∗ n ( α ) | C ) + P ( c n ( α + 2 η ) < c ∗ n ( α ) | C ) ≤ P ( c n ( α − η ) < S n ≤ c n ( α + 2 η ) | C )+ 2 P (∆ n > η | C ) + 2 d K ( T n , S n | C )= 3 η + 2 P (∆ n > η | C ) + 2 d K ( T n , S n | C ) a.s.and A n,α := | P ( T n ≤ c ∗ n ( α ) | C ) − α |≤ η + 2 P (∆ n > η | C ) + 3 d K ( T n , S n | C ) a.s.(A.4)Finally, there exists a sequence { α k } such that ess sup α ∈ (0 , A n,α = sup k A n,α k a.s. and thelatter is a.s. bounded by the RHS of (A.4). Therefore,lim sup n →∞ (cid:16) ess sup α ∈ (0 , A n,α (cid:17) ≤ η a.s.and the result follows by considering a sequence η m (cid:38) (cid:4) Proof of Lemma 3.1 . By the mean value theorem, we may write T n = ∇ φ (˜ θ n ) (cid:62) τ n (ˆ θ n − θ n ) and T ∗ n = ∇ φ (˜ θ ∗ n ) (cid:62) τ ∗ n (ˆ θ ∗ n − θ ∗ n ) , (A.5)where ˜ θ n and ˜ θ ∗ n are such that (cid:107) ˜ θ n − θ n (cid:107) ≤ (cid:107) ˆ θ n − θ n (cid:107) and (cid:107) ˜ θ ∗ n − θ ∗ n (cid:107) ≤ (cid:107) ˆ θ ∗ n − θ ∗ n (cid:107) . Then forany r ∈ R and (cid:15) > | P ( T ∗ n ≤ r | G n ) − P ( T (cid:48)∗ n ≤ r | G n ) |≤ P ( T (cid:48)∗ n ≤ r + R ∗ n | G n ) − P ( T (cid:48)∗ n ≤ r − R ∗ n | G n ) ≤ d K ( T (cid:48)∗ n , S (cid:48) n | G n , C ) + Q ( S (cid:48) n , (cid:15) | C ) + P ( R ∗ n > (cid:15) | G n ) a.s. , where R ∗ n ≡ (cid:12)(cid:12)(cid:12) ( ∇ φ (˜ θ ∗ n ) − ∇ φ ( θ ∗ n )) (cid:62) τ ∗ n (ˆ θ ∗ n − θ ∗ n ) (cid:12)(cid:12)(cid:12) . Similarly, for any r ∈ R and (cid:15) > | P ( T n ≤ r | C ) − P ( T (cid:48) n ≤ r | C ) |≤ d K ( T (cid:48) n , S (cid:48) n | C ) + Q ( S (cid:48) n , (cid:15) | C ) + P ( R n > (cid:15) | C ) a.s. , where R n ≡ (cid:12)(cid:12)(cid:12) ( ∇ φ (˜ θ n ) − ∇ φ ( θ n )) (cid:62) τ n (ˆ θ n − θ n ) (cid:12)(cid:12)(cid:12) . By Lemma C.6 the sequence { θ ∗ n } is C -asymptotically tight. Therefore, using LemmaC.5 together with the C -asymptotic tightness of τ ∗ n (ˆ θ ∗ n − θ ∗ n ) and τ n (ˆ θ n − θ n ) it follows that P ( R ∗ n > (cid:15) | C ) → P ( R n > (cid:15) | C ) → ν > n →∞ P ( d K ( T ∗ n , T (cid:48)∗ n | G n ) > ν | C ) ≤ ν − ess inf (cid:15)> lim sup n →∞ Q ( S (cid:48) n , (cid:15) | C ) = 0 a.s.and lim sup n →∞ d K ( T n , T (cid:48) n | C ) ≤ ess inf (cid:15)> lim sup n →∞ Q ( S (cid:48) n , (cid:15) | C ) = 0 a.s.The result then follows from the triangle inequality. (cid:4) Proof of Theorem 3.2 . Follows immediately from Lemma 3.1 and Theorem 3.1. (cid:4)

Proof of Corollary 3.2 . Consider Equation (A.5) in the proof of Lemma 3.1. By LemmaC.3 T n converges C -weakly to S (cid:48) ( ∵ ˜ θ n C− p −−→ θ a.s. and x (cid:55)→ ∇ φ ( x ) is continuous). Hence, d K ( T n , S (cid:48) | C ) → d K ( T ∗ n , S (cid:48) | G n , C ) follows fromarguments similar to those given in the proof of Lemma 3.1. Finally, the result holds byTheorem 3.1. (cid:4) Proof of Proposition 4.1 . Let ζ n,i := (cid:80) j ∈ B n,i Y (cid:48) n,i , where Y (cid:48) n,i := Y n,i − µ n,i , and let ζ ∗ n,i be its resampling version. Then, using the conditional independence of the elements of { ζ ∗ n,i } given G n , we ﬁnd that˜Σ n := Var (cid:32) √ n K n (cid:88) k =1 ζ ∗ n,k | G n (cid:33) = 1 n K n (cid:88) k =1 Var( ζ ∗ n,k | G n )= 1 δ n ( s n ) (cid:32) n (cid:88) i ∈ N n ζ n,i ζ (cid:62) n,i − ¯ ζ n ¯ ζ (cid:62) n (cid:33) = Σ ∗ n − n (cid:88) i ∈ N n ( ω n ( i ) − ζ n,i µ (cid:62) n + µ n ζ (cid:62) n,i ) − ∆ n ( s n ; 2) δ n ( s n ) µ n µ (cid:62) n ≡ Σ ∗ n − A n, + A n, , (A.6)where ¯ ζ n := n − (cid:80) i ∈ N n ζ n,i . On the other hand, using the second line of (A.6),˜Σ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j ) Y (cid:48) n,i Y (cid:48)(cid:62) n,j − δ n ( s n ) × n (cid:88) i ∈ N n ω n ( i ) Y (cid:48) n,i × n (cid:88) i ∈ N n ω n ( i ) Y (cid:48)(cid:62) n,i ≡ B n, + B n, . (A.7)Let y (cid:48) n,i := c (cid:62) Y (cid:48) n,i and µ (cid:48) n := c (cid:62) µ n , where c ∈ R v with (cid:107) c (cid:107) = 1 and note that by Lemma C.1it suﬃces to show that E [ | c (cid:62) (Σ ∗ n − Σ n ) c | | C ] → { y (cid:48) n } is ( L , ψ, C )-weakly dependent with the weak dependence coeﬃcients { γ n } .In the following let Ξ n := (cid:88) s ≥ δ ∂n ( s ) γ − r n,s . Claim A.1. E [ | c (cid:62) ( ˜Σ n − Σ n ) c | | C ] → a.s.Proof. Consider the ﬁrst term on the last line of (A.7). Write c (cid:62) ( B n, − Σ n ) c = 1 n (cid:88) i ∈ N n ( y (cid:48) n,i − E [ y (cid:48) n,i | C ]) + 1 n (cid:88) i ∈ N n ( ω n ( i ) − y (cid:48) n,i + 1 n (cid:88) i ∈ N n (cid:88) j ∈ N n \{ i } ω n ( i, j )( y (cid:48) n,i y (cid:48) n,j − E [ y (cid:48) n,i y (cid:48) n,j | C ])+ 1 n (cid:88) i ∈ N n (cid:88) j ∈ N n \{ i } ( ω n ( i, j ) − E [ y (cid:48) n,i y (cid:48) n,j | C ] ≡ R n, + R n, + R n, + R n, . Using the covariance inequalities established in KMS (2020),(A.8) | R n, | ≤ C (cid:88) s ≥ γ − r n,s × n (cid:88) i ∈ N n (cid:88) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | a.s. , where C = C ( µ r ∨

1) for some constant C ≥

1. Since ˜ ω < ∞ a.s., the RHS of (A.8)is bounded by C (˜ ω + 1)Ξ n < ∞ a.s. Therefore, by the dominated convergence theorem | R n, | → w n,i,j := y (cid:48) n,i y (cid:48) n,j − E [ y (cid:48) n,i y (cid:48) n,j | C ], we ﬁnd that E [ R n, | C ] ≤ ¯ ω n (cid:88) i,j ∈ N n ≤ d n ( i,j ) < s n +1) (cid:88) k,l ∈ N n ≤ d n ( k,l ) < s n +1) | E [ w n,i,j w n,k,l | C ] |≤ C ¯ ω n (cid:88) s ≥ | H n ( s, s n + 1) | γ − r n,s → , where C = C ( µ r ∨

1) for some constant C ≥

1. Finally, E [ | R n, | | C ] ≤ µ r n (cid:88) i ∈ N n | ω n ( i ) − | → E [ R n, | C ] ≤ n (cid:88) i,j ∈ N n (cid:12)(cid:12) Cov( y (cid:48) n,i , y (cid:48) n,j | C ) (cid:12)(cid:12) ≤ C n (1 + Ξ n ) → , where C = C ( µ r ∨

1) for some constant C ≥ c (cid:62) B n, c ≥ E [ c (cid:62) B n, c | C ] ≤ ( D n ( s n )) δ n ( s n ) n (cid:88) i,j ∈ N n (cid:12)(cid:12) E [ y (cid:48) n,i y (cid:48) n,j | C ] (cid:12)(cid:12) ≤ ( D n ( s n )) C δ n ( s n ) n (1 + Ξ n ) → (cid:3) Consider the last two terms on the last line of equation (A.6). Let α n,i := (cid:88) j ∈ B n,i ( ω n ( j ) −

1) and α n := max i ∈ N n | α n,i | . Then, since (cid:88) i ∈ N n ( ω n ( i ) − ζ n,i = (cid:88) i ∈ N n α n,i Y (cid:48) n,i , we have c (cid:62) A n, c = 2 n − (cid:80) i ∈ N n α n,i y (cid:48) n,i µ (cid:48) n and, therefore, E (cid:2) ( c (cid:62) A n, c ) | C (cid:3) ≤ C α n n (1 + Ξ n ) → (cid:107) A n, (cid:107) F ≤ µ r × ∆ n ( s n ; 2) δ n ( s n ) → a.s. (cid:4) Proof of Proposition 4.3 . We use the notation from the proof of Proposition 2.1. Fora vector c ∈ R v with (cid:107) c (cid:107) = 1 we have c (cid:62) (Σ ∗ n − B n, ) c = ¯ y (cid:48) n × δ n ( s n ; 2) δ n ( s n ) − y (cid:48) n n (cid:88) i ∈ N n y (cid:48) n,i (cid:88) j ∈ N n ω n ( i, j ) ≡ Q n, + Q n, , (A.9)where ¯ y n := n − (cid:80) i ∈ N n y (cid:48) n,i . Consider the second term in the last line of (A.9). Letting τ n,i := (cid:80) j ∈ N n ω n ( i, j ) and noticing thatmax i ∈ N n τ n,i ≤ max i ∈ N n ( ω n ( i ) + ˜ ω | N n ( i ; s n + 1) | ) ≤ (˜ ω + 1) D n ( s n ) ≡ τ n , it follows that | Q n, | ≤ |√ τ n ¯ y (cid:48) n | × n √ τ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i ∈ N n τ n,i y (cid:48) n,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≡ |√ τ n ¯ y (cid:48) n | × Q n, . Similarly to the proof of Proposition 4.1, E [ Q n, | C ] ≤ C τ n n (1 + Ξ n ) → E [ τ n ¯ y (cid:48) n | C ] is bounded the same quantity and since δ n ( s n ; 2) /δ n ( s n ) ≤ τ n , E [ Q n, | C ] ≤ E [ τ n ¯ y (cid:48) n | C ] → (cid:4) Proof of Proposition 4.2 . Consider T ∗ ,n ﬁrst. Let ˜ Y n,i := Y n,i − µ n , ˜ Z n,i := (cid:80) k ∈ B n,i ˜ Y n,k and let ˜ Z ∗ n,i be the bootstrap version of the letter. Also deﬁne W ∗ n,i := ˜ Z ∗ n,i − E [ ˜ Z ∗ n,i | G n ].Conditionally on G n , { ξ ∗ n,i } are row-wise i.i.d. random vectors with E [ W ∗ n, | G n ] = 0 andVar( W ∗ n, | G n ) = δ n ( s n )Σ ∗ n a.s. Write √ n ( ˜ Y ∗ n − µ ∗ n ) = 1 √ K n K n (cid:88) k =1 ( δ n ( s n )) − / W ∗ n,k . Then, letting λ ( A ) denote the minimal eigenvalue of a square matrix A , by Corollary C.1, d K (cid:0) T ∗ ,n , S ∗ ,n | G n (cid:1) ≤ C v ( λ (Σ) / / (cid:18) E [ (cid:107) W ∗ n, (cid:107) | G n ] √ nδ n ( s n ) (cid:19) / a.s. on { λ (Σ ∗ n ) ≥ λ (Σ) / } , where S ∗ ,n = (cid:107) Q n (cid:107) and Q n is conditionally normal given G n with zero mean and variance Σ ∗ n . Claim A.2. E [ (cid:107) W ∗ n,i (cid:107) | C ] / ( √ nδ n ( s n )) → a.s. Proof.

It suﬃces to show that E [ | c (cid:62) W n,i | | C ] / ( √ nδ n ( s n )) → c ∈ R v suchthat (cid:107) c (cid:107) = 1. By the c r -inequality, E [ | c (cid:62) W ∗ n, | | G n ] ≤ E [ | c (cid:62) ˜ Z ∗ n, | | G n ] = 8 n (cid:88) i ∈ N n | c (cid:62) ˜ Z n,i | a.s.Let ˜ y n,i := c (cid:62) ˜ Y n,i . Then E [ | c (cid:62) ˜ Z n,i | | C ] ≤ (cid:88) j ,j ,j ,j ∈ B n,i | Cov(˜ y n,j ˜ y n,j , ˜ y n,j ˜ y n,j | C ) | +  (cid:88) j ,j ∈ B n,i | Cov(˜ y n,j , ˜ y n,j | C ) |  ≡ A n,i + C n,i a.s.Similarly to the proof of Proposition 4.1, we ﬁnd that w.p.1, A n,i ≤ C (˜ µ p ∨ | B n,i | (cid:88) s ≥ h loc,n ( s, s n ) γ − p n,s and C n,i ≤ C (˜ µ p ∨ | B n,i | (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s , where C and C are some positive constants. The result then follows by noticing that E [ | c (cid:62) ˜ Z n,i | | C ] ≤ ( E [ | c (cid:62) ˜ Z n,i | | C ]) / a.s. and the fact that ( A n,i + C n,i ) / ≤ A / n,i + C / n,i . (cid:3) Using Jensen’s inequality, we ﬁnd that for any (cid:15) > P (cid:0) d K (cid:0) T ∗ ,n , S ∗ ,n | G n (cid:1) > (cid:15) | C (cid:1) ≤ C v ( λ (Σ) / / (cid:18) E [ (cid:107) W ∗ n, (cid:107) | C ] √ nδ n ( s n ) (cid:19) / + P ( λ (Σ ∗ n ) < λ (Σ) / | C ) → , where the convergence of P ( λ (Σ ∗ n ) < λ (Σ) / | C ) follows from the fact that the eigenval-ues of a matrix depend continuously on the entries of the matrix (see, e.g., Zhang, 2011,Theorem 2.11) so that λ j (Σ ∗ n ) C− p −−→ λ j (Σ) a.s. for all 1 ≤ j ≤ d .Since the eigenvalues of Σ ∗ n converge to the eigenvalues of Σ and the latter are a.s.positive, it follows from Lemma C.9 that d K (cid:0) S ∗ ,n , (cid:107) Σ / η (cid:107) | G n , C (cid:1) C− p −−→ T ,n → (cid:107) Σ / η (cid:107) C -weakly and, hence, the result follows fromCorollary 3.1. We consider the conditional versions of all inequalities used in this proof. Consider the second assertion. First, for any c ∈ R v such that (cid:107) c (cid:107) = 1 and (cid:15) >

0, we get P (cid:16) | c (cid:62) ( ˜ Y n − µ ∗ n ) | > (cid:15) | C (cid:17) ≤ K n (cid:15) ) E (cid:32) K n (cid:88) k =1 c (cid:62) ˜ Z ∗ n,k (cid:33) | C  = 1 K n (cid:15) E [ c (cid:62) Σ ∗ n c | C ] → , where the convergence follows from the consistency of Σ ∗ n . Therefore, ˜ Y ∗ n − µ ∗ n C− p −−→ C -asymptotic tightness of a vector follows from that of its elements, √ n ( ˜ Y ∗ n − µ ∗ n ) is C -asymptotically tight due to the same reason (i.e., the convergence of Σ ∗ n ).Write ∇ φ ( µ ∗ n ) (cid:62) √ n ( ˜ Y ∗ n − µ ∗ n ) = 1 √ K n K n (cid:88) k =1 ( δ n ( s n )) − / ∇ φ ( µ ∗ n ) (cid:62) W ∗ n,k . Then by Lemma C.8, letting T (cid:48)∗ ,n = ∇ φ ( µ ∗ n ) (cid:62) √ n ( ˜ Y ∗ n − µ ∗ n ) and S (cid:48)∗ ,n = ∇ φ ( µ ∗ n ) (cid:62) Q n , we have d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) ≤ C ( σ/ × (cid:107)∇ φ ( µ ∗ n ) (cid:107) E [ (cid:107) W ∗ n, (cid:107) | G n ] √ nδ n ( s n )a.s. on { σ ∗ n ≥ σ/ } , where σ ∗ n = ∇ φ ( µ ∗ n ) (cid:62) Σ ∗ n ∇ φ ( µ ∗ n ) and σ = ∇ φ ( µ ) (cid:62) Σ ∇ φ ( µ ). Since x (cid:55)→ ∇ φ ( x ) is continuous and µ ∗ n is a consistent estimator of µ , ∇ φ ( µ ∗ n ) C− p −−→ ∇ φ ( µ ) and σ ∗ n C− p −−→ σ a.s. Consequently, as in the previous case, d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) C− p −−→ d K (cid:0) S (cid:48)∗ ,n , ∇ φ ( µ ) (cid:62) Σ / η | G n , C (cid:1) C− p −−→ (cid:4) Proof of Proposition 4.4 . The proof is similar to one for Proposition 4.2, and so isomitted. (cid:4)

Proof of Proposition 4.5 . By Lemma C.11, d K (cid:0)(cid:13)(cid:13) T ∗ ,n (cid:13)(cid:13) , (cid:13)(cid:13) S ∗ ,n (cid:13)(cid:13) | G n (cid:1) C− p −−→ d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) C− p −−→ , where S ∗ ,n = (cid:107) Q n (cid:107) , S (cid:48)∗ ,n = ∇ φ ( ¯ Y n ) (cid:62) Q n and Q n is conditionally normal given G n with zeromean and variance Σ ∗ n . The rest is similar to the proof of Proposition 4.2. (cid:4) Appendix B. Network HAC Estimator

Although the HAC estimator (2.5) is consistent in the sense that ˆΣ n − Σ n C− p −−→ Q n Λ n Q (cid:62) n be the eigendecomposition of ˆΣ n (since ˆΣ n is symmetric all its eigenvaluesare real). Also let λ ( A ) denote the smallest eigenvalue of A , e.g., λ ( ˆΣ n ) = min ≤ k ≤ v Λ n .Consider a sequence of small positive real numbers c n (cid:38)

0. We approximate ˆΣ n byˆ V + n := Q n (Λ n ∨ c n I v ) Q (cid:62) n , where the maximum is taken element-wise. By construction, the matrix ˆΣ + n is positivedeﬁnite. Moreover, in the case when the smallest eigenvalue of Σ n is bounded from belowby some positive constant, it is also a consistent estimator of the true variance as followsfrom the next result. Proposition B.1.

Suppose that ˆΣ n − Σ n C− p −−→ a.s. and there exists a constant c > suchthat P ( λ (Σ n ) ≥ c ev. ) = 1 . Then ˆΣ + n − Σ n C− p −−→ a.s.Proof. Fix (cid:15) >

0. Then P (cid:16) (cid:107) ˆΣ + n − Σ n (cid:107) > (cid:15) | C (cid:17) ≤ P (cid:16) (cid:107) ˆΣ n − Σ n (cid:107) > (cid:15) | C (cid:17) + P (cid:16) λ ( ˆΣ n ) < c n | C (cid:17) a.s.(B.1)The ﬁrst term on the RHS of (B.1) trivially converges to 0 a.s. As for the second term,using the properties of the Rayleigh quotient, λ ( ˆΣ n ) = min x : (cid:107) x (cid:107) =1 x (cid:62) ˆΣ n x ≥ λ (Σ n ) + λ ( ˆΣ n − Σ n ) . Therefore, noticing that | λ ( A ) | ≤ (cid:107) A (cid:107) , P (cid:16) λ ( ˆΣ n ) < c n | C (cid:17) ≤ P (cid:16) λ ( ˆΣ n − Σ n ) < c n − c | C (cid:17) + { λ (Σ n ) < c } → (cid:4) If Σ n converges a.s. to a positive deﬁnite matrix Σ, then we may relax the assumptionsof the preceding result. Proposition B.2.

Suppose that ˆΣ n − Σ C− p −−→ a.s., where Σ is positive deﬁnite. Then ˆΣ + n − Σ C− p −−→ a.s.Proof. As in the proof of Proposition B.1 for any (cid:15) > n →∞ P (cid:16) (cid:107) ˆΣ + n − Σ (cid:107) > (cid:15) | C (cid:17) ≤ ess inf c> { λ (Σ) < c } = 0 a.s. (cid:4) Appendix C. Auxiliary Results

In the following we assume that all random elements are deﬁned on a common probabilityspace (Ω , P , H ). Also for a vector x ∈ R v let (cid:107) x (cid:107) denote the Euclidean norm of x and let (cid:107) · (cid:107) e,p be the element-wise p -norm in R a × b , i.e., (cid:107) A (cid:107) e,p := (cid:107) vec( A ) (cid:107) p . Lemma C.1.

Let A n be a sequence of symmetric matrices in R v × v and F ⊂ H . Then thefollowing are equivalent: (a) E [ (cid:107) A n (cid:107) e, | C ] → a.s. (b) E [ (cid:107) A n (cid:107) F | C ] → a.s. (c) E [ | c (cid:62) A n c | | C ] → a.s. for any c ∈ R v such that (cid:107) c (cid:107) = 1 .Proof. (a) is equivalent to ( b ) because (cid:107) A n (cid:107) F ≤ (cid:107) A n (cid:107) e, ≤ v (cid:107) A n (cid:107) F . The equivalence of (a) and (c) follows from the next inequalities: | c (cid:62) A n c | ≤ (cid:107) c (cid:107) ∞ (cid:107) A n (cid:107) e, . and, letting z + ij = ( e i + e j ) / √ z − ij = ( e i − e j ) / √

2, where { e , . . . , e v } is a standardbasis for R v , (cid:107) A n (cid:107) e, ≤ v (cid:88) i,j =1 (cid:0) | z + (cid:62) ij A n z + ij | + | z −(cid:62) ij A n z − ij | (cid:1) . (cid:4) The following is a simple extension of Lemma A.3. in Crimaldi (2009) to the multidimen-sional case. For a random vector X ∈ R v and F ⊂ H let Q F X denote the regular conditionaldistribution of X given F and let ˆ ϕ X be the corresponding characteristic functions, i.e.,for t ∈ R v , ˆ ϕ X ( ω, t ) = (cid:90) exp( it (cid:62) x ) Q F X ( ω, dx ) . Also the conditional characteristic function of X given F is given by ϕ X ( t | F ) := E [exp( it (cid:62) X ) | F ] and for a ﬁxed t ∈ R v and almost all ω ∈ Ω, ˆ ϕ X ( ω, t ) = ϕ X ( t | F )( ω ). Lemma C.2.

Let { X n } be a sequence of random vectors in R v and F ⊂ H . Then X n → X F -weakly, i.e., for almost all ω ∈ Ω , Q F X n ( ω, · ) → Q F X ( ω, · ) weakly, iﬀ for every t ∈ R v , ˆ ϕ X n ( · , t ) → ˆ ϕ X ( · , t ) a.s. The next lemma provides a number of useful properties of the almost sure conditionalconvergence which are typical of the usual weak convergence.

Lemma C.3.

Let { X n } and { Y n } be sequences of random vectors in R v and R w , respec-tively, and F ⊂ H . Then (a) If X n → X F -weakly and g : R v → R d is continuous, then g ( X n ) → g ( X ) F -weakly. (b) X n → X F -weakly iﬀ s (cid:62) X n → t (cid:62) X F -weakly for all s ∈ R v . (c) If Y n F− p −−→ Y a.s., where Y is F -measurable, then Y n → Y F -weakly. (d) Let v = w . If X n → X F -weakly and X n − Y n F− p −−→ a.s., then Y n → X F -weakly. (e) If X n → X F -weakly, Y n F− p −−→ Y a.s., where Y is F -measurable, then ( X (cid:62) n , Y (cid:62) n ) → ( X (cid:62) , Y (cid:62) ) F -weakly.Proof. ( a ) This follows from Lemma C.2 and the fact that x (cid:55)→ exp( it (cid:62) g ( x )) is a bounded,continuous function.(b) The suﬃciency follow from part (a) because x (cid:55)→ s (cid:62) x is continuous. For the necessity,suppose that all linear combinations converge F -weakly. Then ϕ X n ( t | F ) = ϕ t (cid:62) X n (1 | F ) → ϕ t (cid:62) X (1 | F ) = ϕ X ( t | F ) a.s.and the result follows from Lemma C.2.(c) Since for any t ∈ R w and (cid:15) > | e it (cid:62) ( Y n − Y ) − | ≤ (cid:15) on | t (cid:62) ( Y n − Y ) | ≤ (cid:15) , we have | ϕ Y n ( t | F ) − ϕ Y ( t | F ) | ≤ E (cid:104)(cid:12)(cid:12)(cid:12) e it (cid:62) ( Y n − Y ) − (cid:12)(cid:12)(cid:12) | F (cid:105) ≤ (cid:15) + P (cid:0) | t (cid:62) ( Y n − Y ) | > (cid:15) | F (cid:1) a.s.Therefore, lim sup n ≥ | ϕ Y n ( t | F ) − ϕ Y ( t | F ) | ≤ (cid:15) a.s.The result follows by considering a sequence (cid:15) m (cid:38) t ∈ R v , | ϕ Y n ( t | F ) − ϕ X ( t | F ) | ≤ E (cid:104)(cid:12)(cid:12)(cid:12) e it (cid:62) ( Y n − X n ) − (cid:12)(cid:12)(cid:12) | F (cid:105) → X (cid:62) n , Y (cid:62) ) → ( X (cid:62) , Y (cid:62) ) F -weakly, the result follows from part (d). (cid:4) Lemma C.4.

Let { X n } be a sequence of random variables, F ⊂ H , and let X be a randomvariable with ( a.s. ) continuous conditional cdf given F ( i.e., the map t (cid:55)→ F F X ( ω, t ) iscontinuous for ( almost ) all ω ∈ Ω) . Then X n → X F -weakly iﬀ d K ( X n , X | F ) → a.s.Proof. The necessity holds by Theorem 3.1.2 in Shiryaev (2016) because d K ( X n , X | F ) → ω ∈ Ω the regular conditional cdfs converge and thesuﬃciency follows from the ω -wise application of P´olya’s theorem (e.g., Athreya and Lahiri,2006, Theorem 9.1.4). (cid:4) Lemma C.5.

Suppose that f : R v → R w is continuous and { X n } and { Y n } are sequencesof random vectors in R v such that Y n − X n F− p −−→ a.s. for some F ⊂ H and { X n } is F -asymptotically tight. Then f ( Y n ) − f ( X n ) F− p −−→ a.s.Proof. For any z >

0, the restriction f | B (0 ,z ) is uniformly continuous, i.e., ∀ (cid:15) > ∃ δ (cid:15) > x, y ∈ B (0 , z ), (cid:107) f ( x ) − f ( y ) (cid:107) < (cid:15) whenever (cid:107) x − y (cid:107) < δ (cid:15) . Fix (cid:15) >

0. Then P ( (cid:107) f ( Y n ) − f ( X n ) (cid:107) > (cid:15) | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > δ (cid:15) | F )+ P ( (cid:107) Y n (cid:107) > x | F ) + P ( (cid:107) X n (cid:107) > z | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > δ (cid:15) ∧ z/ | F )+ 2 P ( (cid:107) X n (cid:107) > z/ | F ) a.s.Therefore, lim sup n →∞ P ( (cid:107) f ( Y n ) − f ( X n ) (cid:107) > (cid:15) | F ) ≤ z> lim sup n →∞ P ( (cid:107) X n (cid:107) > z | F ) = 0 a.s. (cid:4) Lemma C.6.

Suppose that { X n } and { Y n } are sequences of random vectors in R v suchthat X n is F -measurable for all n ≥ and some F ⊂ H , sup n (cid:107) X n (cid:107) < ∞ a.s., and Y n − X n F− p −−→ a.s. Then { Y n } is F -asymptotically tight.Proof. For any y > P ( (cid:107) Y n (cid:107) > y | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > y/ | F )+ (cid:8) sup n (cid:107) X n (cid:107) > y/ (cid:9) a.s.Therefore, ess inf y> lim sup n →∞ P ( (cid:107) Y n (cid:107) > y | F ) ≤ ess inf y> (cid:8) sup n (cid:107) X n (cid:107) > y (cid:9) = 0 a.s. (cid:4) In the following, for r, (cid:15) ≥ S r,(cid:15) := { x ∈ R v : r ≤ (cid:107) x (cid:107) ≤ r + (cid:15) } . Lemma C.7.

Suppose that Z is a standard normal random vector in R v with v ≥ and λ ≥ λ ≥ · · · ≥ λ v > are constants. Let Λ := diag( λ , . . . , λ v ) and N = Λ / Z . Then forall (cid:15) ≥ , r ≥ , Q ( (cid:15), (cid:107) N (cid:107) ) = sup r ≥ P (cid:0) N ∈ S r,(cid:15) (cid:1) ≤ C d (cid:15) √ λ , where C v ≡ √ v − .Proof. Let X := (cid:80) i =1 N i , Y := (cid:80) di =3 N i , and note that (cid:107) N (cid:107) = √ X + Y . Then letting f X denote the density of X we have f X ( x ) = 12 π √ λ λ (cid:90) x e − z λ − x − z λ ( z ( x − z )) − / dz ≤ π √ λ λ B (1 / , / e − x λ . (C.1)For y ≥ √ X + y is zero on ( −∞ , √ y ) and using (C.1) it can be boundedon [ √ y, ∞ ) by f √ X + y ( x ) = 2 xf X ( x − y ) ≤ √ y + λ √ λ λ , so that for all r ≥ g ( y ) = P (cid:16) r ≤ (cid:112) X + y ≤ r + (cid:15) (cid:17) ≤ √ y + λ √ λ λ (cid:15). Hence, for d ≥

3, noticing that X is independent of Y , we ﬁnd that P (cid:16) r ≤ √ X + Y ≤ r + (cid:15) (cid:17) = E [ g ( Y )] ≤ (cid:15) √ λ λ E [ Y + λ ] / ≤ (cid:15) √ λ (cid:32) d (cid:88) i =3 λ i λ + 1 (cid:33) / , which proves the result. (cid:4) Let φ ( w ) := (cid:107) w (cid:107) . This function is trice continuously diﬀerentiable on R v \ { } and thefollowing bounds on the derivatives of φ hold: | φ (cid:48) ( w )( x ) | ≤ (cid:107) x (cid:107)| φ (cid:48)(cid:48) ( w )( x, y ) | ≤ (cid:107) w (cid:107) − (cid:107) x (cid:107) (cid:107) y (cid:107)| φ (cid:48)(cid:48)(cid:48) ( w )( x, y, z ) | ≤ (cid:107) w (cid:107) − (cid:107) x (cid:107) (cid:107) y (cid:107) (cid:107) z (cid:107) . (C.2)For a real symmetric matrix B we denote the j -th order statistic of its eigenvalues by λ ( j ) ( B ). Finally, we say that a random vector X is conditionally normal given F ⊂ H with zero mean and the conditional covariance matrix V , denoted by X | F ∼ N (0 , V ), if V is F -measurable, a.s. ﬁnite and positive semi-deﬁnite and the conditional characteristicfunction of X is given by E [ e it (cid:62) X | F ] = exp (cid:18) − t (cid:62) V t (cid:19) a.s.

Theorem C.1.

Let X , . . . , X n be random vectors in R v that are conditionally independentgiven F ⊂ H with E [ X i | F ] = 0 and E [ (cid:107) X i (cid:107) | F ] < ∞ a.s. Let T := (cid:80) ni =1 X i and let N be a random vector in R v such that N | F ∼ N (0 , V ) , where V = E [ T T (cid:62) | F ] a.s. Then,assuming that υ ≡ λ ( d ∨ − ( V ) > a.s., d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F ) ≤ C d (cid:32) υ − / n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] (cid:33) / a.s. , where C d > is a constant depending only on d .Proof. Let f be a trice continuously diﬀerential function, such that f ( x ) = 1 if x ≤ f = 0 if x ≥ (cid:15) >

0, and (cid:12)(cid:12) f ( j ) ( x ) (cid:12)(cid:12) ≤ D(cid:15) − j (0 ,(cid:15) ) ( x ) for some constant D > ≤ j ≤ g r ( s ) := f ( (cid:107) s (cid:107) − r ) . First, P ( (cid:107) T (cid:107) ≤ r | F ) ≤ E [ g r ( T ) | F ] ≤ P ( (cid:107) N (cid:107) ≤ r + (cid:15) | F ) + E [ g r ( T ) − g r ( N ) | F ]and P ( (cid:107) T (cid:107) > r | F ) ≤ − E [ g r − (cid:15) ( T ) | F ] ≤ P ( (cid:107) N (cid:107) > r − (cid:15) | F ) + E [ g r − (cid:15) ( N ) − g r − (cid:15) ( T ) | F ] a.s. for all r ≥ (cid:15) >

0. Therefore, w.p.1, d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F )= sup q ∈ Q ≥ | P ( (cid:107) T (cid:107) ≤ q | F ) − P ( (cid:107) N (cid:107) ≤ q | F ) |≤ sup q ∈ Q > | E [ g q ( T ) − g q ( N ) | F ] | + sup q ∈ Q ≥ P ( N ∈ S q,(cid:15) | F ) , (C.3)Consider the ﬁrst term on the third line of (C.3). Claim C.1.

There exists a constant

B > such that for any q > , | E [ g q ( T ) − g q ( N ) | F ] | ≤ B(cid:15) n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] a.s.Proof. Let Z , . . . , Z n be i.i.d. standard normal random vectors in R v independent of X , . . . , X n and F and let Y i := V / i Z i , where V i is a version of E [ X i X (cid:62) i | F ]. Deﬁne U i := i − (cid:88) k =1 X k + n (cid:88) k = i +1 Y k and W i := g q ( U i + X i ) − g q ( U i + Y i ) . Then g q ( T ) − g q ( N ) = (cid:80) ni =1 W i and | E [ g q ( T ) − g q ( N ) | F ] | ≤ n (cid:88) i =1 | E [ W i | F ] | a.s.Let G i := F (cid:87) σ ( X , . . . , X i − , Z i +1 , . . . , Z n ) and let h i ( λ ) := g q ( U i + λX i ) and h i ( λ ) := g q ( U i + λY i ). Using Taylor expansion up to the third order, we ﬁnd that W i = (cid:88) j =0 j ! (cid:16) h ( j ) i (0) − h ( j ) i (0) (cid:17) + 13! (cid:16) h (3) i ( λ ) − h (3) i ( λ ) (cid:17) , where | λ | , | λ | <

1. The tower property of conditional expectations and the fact that X i and Y i are conditionally independent of G i given F imply that E [ h ( j ) i (0) − h ( j ) i (0) | F ] = 0 a.s.for j = 1 ,

2. Finally, using the bounds in (C.2) and noticing that | f ( j ) ( x − q ) | ≤ D(cid:15) − x − j × ( q,q + (cid:15) ) ( x ) for 1 ≤ j ≤

3, we get | E [ h (3) i − h (3) i | F ] | ≤ B(cid:15) (cid:0) E [ (cid:107) X i (cid:107) | F ] + E [ (cid:107) Y i (cid:107) | F ] (cid:1) a.s.for some constant B >

0. The result then follows from Lemma 4 in Rhee and Talagrand(1986), i.e., there is a constant

M > E [ (cid:107) Y i (cid:107) | F ] ≤ M E [ (cid:107) X i (cid:107) | F ] a.s. (cid:3) Using Lemma C.7 when d ≥ d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | G ) ≤ B(cid:15) n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] + C (cid:48) d √ υ (cid:15) a.s.For d = 1 we have P ( N ∈ S q,(cid:15) | F ) ≤ (cid:15)/ √ πυ and the same bound holds. Finally, since(C.4) holds for any (cid:15) > (cid:15) a.s. on { (cid:15) ∈ (0 , ∞ ) } . Then the resultfollows by taking (cid:15) = (cid:0) √ υ (cid:80) ni =1 E [ (cid:107) X i (cid:107) | F ] (cid:1) / . (cid:4) Corollary C.1.

Let X , . . . , X n be conditionally i.i.d. given F ⊂ H with E [ X | F ] = 0 and E [ (cid:107) X (cid:107) | F ] < ∞ a.s. Let T := n − / (cid:80) ni =1 X i and let N | F ∼ N (0 , V ) , where V = E [ X X (cid:62) | F ] a.s. Then, assuming that υ ≡ λ ( d ∨ − ( V ) > a.s. d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F ) ≤ C d (cid:32) E [ (cid:107) X (cid:107) | F ] υ / √ n (cid:33) / a.s. , where C d > is a constant depending only on d . Lemma C.8.

Let X , . . . , X n be random variables that are conditionally i.i.d. given F ⊂ H with E [ X | F ] = 0 and E [ | X | | F ] < ∞ a.s. Let T := n − / (cid:80) ni =1 X i and N | F ∼N (0 , σ ) , where σ = Var( X | F ) a.s. Then, assuming that σ > a.s., d K ( T, N | F ) ≤ C E [ | X | | F ] σ √ n a.s. , where C > is a constant.Proof. The proof is similar to the proof of Theorem 11.4.1 in Athreya and Lahiri (2006)( for the unconditional case ) and so is omitted. (cid:4)

Lemma C.9.

Suppose that G and F are σ -ﬁelds such that F ⊂ G ⊂ H , X and Y arerandom vectors in R d such that X | G ∼ N (0 , Σ X ) and Y | F ∼ N (0 , Σ Y ) . Then, assumingthat υ ≡ λ ( d ∨ − (Σ Y ) > a.s., d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G , F ) ≤ C d (cid:16) υ − (cid:107) Λ X − Λ Y (cid:107) e, ∞ (cid:17) / a.s. , (C.5) where C d is a constant depending only on d and Λ ( · ) is the matrix of eigenvalues corre-sponding to Σ ( · ) .Proof. Let f be a twice continuously diﬀerential function such that f ( x ) = 1 if x ≤ f ( x ) = 0 if x ≥ (cid:15) > (cid:12)(cid:12) f ( j ) (cid:12)(cid:12) ≤ D(cid:15) − j (0 ,(cid:15) ) ( x ) for some constant D > ≤ j ≤ g r ( s ) := f ( (cid:107) s (cid:107) − r ) . As in the proof of Theorem C.1 for any (cid:15) > d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G , F ) ≤ sup q ∈ Q > | E [ g q ( X ) | G ] − E [ g q ( Y ) | F ] | + sup q ∈ Q ≥ P ( Y ∈ S q,(cid:15) | F ) . Let Z and Z be independent standard normal random vectors in R d that are independentof G and F , respectively. Then E [ g q ( X ) | G ] − E [ g q ( Y ) | F ] = E [ g q (Λ / X Z ) | G ] − E [ g q (Λ / Y Z ) | F ]= h q, (Λ X ) − h q, (Λ Y ) a.s. , where h q, ( λ ) := E g q ( λ / Z ) and h q, ( λ ) := E g q ( λ / Z ) (see, e.g., Durrett, 2010, Lemma6.2.1). Claim C.2.

There exists a constant B d depending only on d such that for any q > , | h q, ( λ X ) − h q, ( λ Y ) | ≤ B d (cid:15) (cid:107) λ X − λ Y (cid:107) e, ∞ . Proof.

For t ∈ [0 ,

1] let Z ( t ) := √ tλ / X Z + √ − tλ / Y Z and φ ( t ) := E g q ( Z ( t )). Then h q, ( λ X ) − h q, ( λ Y ) = φ (1) − φ (0) = (cid:90) φ (cid:48) ( t ) dt. Using the integration by parts formula (see Equation A.17 in Talagrand, 2011, Section A.6)for t ∈ (0 , φ (cid:48) ( t ) = 12 E (cid:20)(cid:16) λ / X Z / √ t − λ / Y Z / √ − t (cid:17) (cid:62) ∇ g q ( Z ( t )) (cid:21) = 12 E (cid:2) i (cid:62) ( λ X − λ Y ) ◦ ∇ g q ( Z ( t )) i (cid:3) , where i is the vector of ones. Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) φ (cid:48) ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) λ X − λ Y (cid:107) e, ∞ (cid:90) E (cid:12)(cid:12) i (cid:62) ∇ g q ( Z ( t )) i (cid:12)(cid:12) dt. Since | f ( j ) ( x − q ) | ≤ D(cid:15) − x − j × ( q,q + (cid:15) ) ( x ) for 1 ≤ j ≤

2, the ( k, l )-th element of theHessian of g q is bounded by D (cid:48) (cid:15) − for some constant D (cid:48) >

0. Therefore, | h q, ( λ X ) − h q, ( λ Y ) | ≤ D (cid:48) d (cid:15) (cid:107) λ X − λ Y (cid:107) e, ∞ . (cid:3) Using Lemma C.7 when d ≥ d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G ) ≤ B d (cid:15) (cid:107) Λ X − Λ Y (cid:107) e, ∞ + C (cid:48) d √ υ (cid:15) a.s. For d = 1, P ( N ∈ S q,(cid:15) | F ) ≤ (cid:15)/ √ πυ a.s., so that the same bound holds. Finally, since(C.6) holds for any (cid:15) > (cid:15) a.s. on { (cid:15) ∈ (0 , ∞ ) } . Consequently, theresult follows by taking (cid:15) = ( √ υ (cid:107) Λ X − Λ Y (cid:107) e, ∞ ) / and noticing that (C.5) holds triviallyon {(cid:107) Λ X − Λ Y (cid:107) e, ∞ = 0 } . (cid:4) Lemma C.10.

Suppose that G and F are σ -ﬁelds such that F ⊂ G ⊂ H and let X | G ∼N (0 , σ X ) and Y | F ∼ N (0 , σ Y ) . Then, assuming that σ Y > a.s., d K ( X, Y | G , F ) ≤ C (cid:12)(cid:12) σ X /σ Y − (cid:12)(cid:12) / a.s. , where C > is a constant.Proof. The proof is similar to one for Lemma C.9, and so is omitted. (cid:4)

Lemma C.11.

Let ( G, ( Y, X )) be a network dependent process in R × R d and let F be asub- σ -ﬁeld of H such that: (a) Y is conditionally independent of X given F ; (b) Y i and Y j are conditionally independent given F if j / ∈ B i := N ( i ; s ) for some s > ; (c) D ( G ) is F -measurable.Let G := σ ( F ∪ σ ( X )) , T := (cid:80) i ∈ N Y i X i , and Z | G ∼ N (0 , V ) , where V = E [ T T (cid:62) | G ] a.s.Then, assuming that υ ≡ λ ( d ∨ − ( V ) > a.s., d K ( (cid:107) T (cid:107) , (cid:107) Z (cid:107) | G ) ≤ C d (cid:0) υ − / β (cid:1) / a.s. , where C d > is a constant depending only on d and β := (cid:88) i ∈ N (cid:88) j ∈ B i (cid:88) k ∈ B i ∪ B j (cid:89) l ∈{ i,j,k } (cid:107) Y l (cid:107) F , (cid:107) X l (cid:107) ∞ . In addition, when d = 1 , d K ( T, Z | G ) ≤ C (cid:0) υ − / β (cid:1) / a.s.Proof. We use the notation from the proof of Theorem C.1. First, for any (cid:15) > d K ( (cid:107) T (cid:107) , (cid:107) Z (cid:107) | G ) ≤ sup q ∈ Q > | E [ g q ( T ) − g q ( Z ) | G ] | + sup q ≥ P ( Z ∈ S q,(cid:15) | G ) . Let Y (cid:48) | F ∼ N (0 , Σ) conditionally independent of (

Y, X ) given F , where Σ = Var( Y | F )a.s., and let Z (cid:48) := (cid:80) i ∈ N Y (cid:48) i X i . Note that E [ g q ( Z ) | G ] = E [ g q ( Z (cid:48) ) | G ] a.s. Also let Q Y and Q Y (cid:48) be the regular conditional distributions of Y and Y (cid:48) given F and Q := Q Y ⊗ Q Y (cid:48) . Since X is G -measurable, for almost all ω ∈ Ω, E [ g q ( T ) − g q ( Z ) | G ]( ω ) = h q ( ω ) , where h q ( ω ) := (cid:90) R n × g q (cid:32)(cid:88) i ∈ N y i X i ( ω ) (cid:33) − g q (cid:32)(cid:88) i ∈ N y (cid:48) i X i ( ω ) (cid:33) Q ( ω, d ( y, y (cid:48) ))(see, e.g., Kallenberg, 2002, Theorem 5.4). Claim C.3.

There exists a constant B d > depending only on d such that for any q > , | h q ( ω ) | ≤ B d (cid:15) (cid:88) i ∈ N (cid:88) j ∈ B i (cid:88) k ∈ B i ∪ B j (cid:89) l ∈{ i,j,k } ( χ l ( ω )) / (cid:107) X l ( ω ) (cid:107) ∞ , where χ i ( ω ) := (cid:82) R n y i Q Y ( ω, d ( y )) .Proof. For y ≡ { y i } i ∈ N let φ ( y ) := g q ( (cid:80) i ∈ N y i X i ( ω )). Then the result follows from Theorem3.4 in R¨ollin (2013) by observing that (cid:107) φ ijk (cid:107) ∞ ≤ B (cid:48) d (cid:15) (cid:89) l ∈{ i,j,k } (cid:107) X l ( ω ) (cid:107) ∞ for some constant B (cid:48) d > d , where φ ijk is the third order partialderivative of φ w.r.t. the coordinates i , j , and k . (cid:3) As in the proof of Theorem C.1 there exists a constant C (cid:48) d > d suchthat sup q ≥ P ( Z ∈ S q,(cid:15) | F ) ≤ C (cid:48) d √ υ (cid:15) a.s.Therefore, noticing that χ i = E [ | Y i | | F ] a.s., the result follows by taking (cid:15) = ( √ υβ ) / .The second assertion for d = 1 follows similarly.= 1 follows similarly.

Related Researches

Optimal transportation and the falsifiability of incompletely specified economic models

by Ivar Ekeland

A note on global identification in structural vector autoregressions

by Emanuele Bacchiocchi

Duality in dynamic discrete-choice models

by Khai Xiang Chiong

A test of non-identifying restrictions and confidence regions for partially identified parameters

by Alfred Galichon

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

by Falco J. Bargagli Stoffi

Extreme dependence for multivariate data

by Damien Bosc

Dilation bootstrap

by Alfred Galichon

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

by Federico A. Bugni

Identification of Matching Complementarities: A Geometric Viewpoint

by Alfred Galichon

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

by Milad Haghani

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

by Milad Haghani

The Econometrics and Some Properties of Separable Matching Models

by Alfred Galichon

Discretizing Unobserved Heterogeneity

by Stéphane Bonhomme Thibaut Lamadon Elena Manresa

Permutation Tests at Nonparametric Rates

by Marinho Bertanha

General Bayesian time-varying parameter VARs for predicting government bond yields

by Manfred M. Fischer

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

by Jiangtao Duan

A Control Function Approach to Estimate Panel Data Binary Response Model

by Amaresh K Tiwari

Set Identification in Models with Multiple Equilibria

by Alfred Galichon

Inference in Incomplete Models

by Alfred Galichon

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

by Luke De Clerk

Bridging factor and sparse models

by Jianqing Fan

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

by Charles F Manski

A Novel Multi-Period and Multilateral Price Index

by Consuelo Rubina Nava

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

by Mario Faliva

Estimation and Inference by Stochastic Optimization: Three Examples

by Jean-Jacques Forneron

«

1

2

3

4

»

Submitted on 28 Jan 2021 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar