TThe Bootstrap for Network Dependent Processes
Denis Kojevnikov
Tilburg University
Abstract.
This paper focuses on the bootstrap for network dependent processes un-der the conditional ψ -weak dependence. Such processes are distinct from other forms ofrandom fields studied in the statistics and econometrics literature so that the existingbootstrap methods cannot be applied directly. We propose a block-based approach anda modification of the dependent wild bootstrap for constructing confidence sets for themean of a network dependent process. In addition, we establish the consistency of thesemethods for the smooth function model and provide the bootstrap alternatives to thenetwork heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We findthat the modified dependent wild bootstrap and the corresponding variance estimator areconsistent under weaker conditions relative to the block-based method, which makes theformer approach preferable for practical implementation. Keywords.
Conditional bootstrap; Block bootstrap; Dependent wild bootstrap; Networkdependent process; Random field; Conditional ψ -weak dependence.
1. Introduction
The aim of this paper is developing bootstrap approaches for the sample mean of networkdependent processes studied in Kojevnikov, Marmer, and Song (2020, hereafter KMS). Anetwork dependent process is a random field indexed by the set of nodes of a given undi-rected network. This network governs the stochastic dependence between the elementsof the associated random field. Specifically, the latter is assumed to satisfy a conditionalversion of the ψ -weak dependence condition of Doukhan and Louhichi (1999) given a com-mon shock of a general form. KMS (2020) show that the pointwise Law of Large Numbersand the Central Limit Theorem hold for a sequence of such processes under suitable as-sumptions on the networks’ denseness and the strength of the stochastic dependence. In E-mail address : [email protected] . Date : February 1, 2021. a r X i v : . [ ec on . E M ] J a n addition, they provide nonparametric HAC estimators of the variance-covariance matrixfor the vector of sample moments, which is similar to the spatial HAC estimator developedin Kelejian and Prucha (2007).These results provide an asymptotic approximation of the distribution of the samplemean which can be used for inference on the true mean of a network dependent process.However, this approximation relies on the network HAC estimator which has two majordrawbacks. First, unlike its spatial or time-series counterparts it is not guaranteed to yielda positive semi-definite estimate. Second, these estimators are known to have poor finitesample properties (see, e.g., Matyas, 1999, Section 3.5). The aim of the current work is toprovide an alternative nonparametric way to conduct inference in these settings.The nonparametric bootstrap methods for the case of weakly dependent observationshave been studies since the introduction of the non-overlapping block bootstrap in Carl-stein (1986) and the moving block bootstrap in K¨unsch (1989) and Liu and Singh (1992)for stationary, mixing time-series. Since then, a number of block-based methods have beenconsidered in the statistics literature. They share the idea of resampling groups of consec-utive observations to capture the stochastic dependence in the original series and include,among others, the circular block bootstrap (Politis and Romano, 1992), the stationarybootstrap (Politis and Romano, 1994) and the tapered block bootstrap (Paparoditis andPolitis, 2001). A detailed exposition and comparison of some of these methods can befound in Lahiri (2003). Block-based bootstrap was also successfully applied to the caseof weakly dependent random fields satisfying certain mixing conditions (see, e.g., Lahiri,2003, Section 12 and references therein).More recent developments in this area of research are discussed in Gon¸calves and Politis(2011). In particular, the dependent wild bootstrap proposed in B¨uhlmann (1993) andShao (2010) departs from other methods. Instead of using blocks, it tries to mimic theautocovariance structure of the original data by introducing auxiliary random variablesand can be applied to irregularly spaced data. A related method, the dependent randomweighting, was recently introduced in Sengupta, Shao, and Wang (2015) and has widerapplicability; specifically, it can be directly applied to irregularly spaced spatial data.Another useful resampling technique developed for stationary and nonstationary times-series and homogenous random fields under mixing is subsampling. A comprehensive treat-ment of this method is given in Politis, Romano, and Wolf (1999). Interestingly, in thetime-series case subsampling is similar to the moving block bootstrap where a single blockis resampled. Finally, it is worth mentioning the spatial smoothed bootstrap suggestedin Garcia-Soidan, Menezes, and Rubinos (2014). In this instance, assuming homogeneityof the underlying data generating process, bootstrap pseudo-samples are drawn from theestimated joint distribution of a given sample. Network dependent processes are closely related to random fields indexed by elementsof a lattice in R d (see, e.g., Comets and Janˇzura, 1998; Conley, 1999). However, they arenot a special case of the latter and so the existing bootstrap methods cannot be directlyapplied to our framework. The main reason for that is the irregularity of the structureof underlying networks. In particular, subsampling and all types of the block bootstrapfor time-series and spatial data rely on the existence of ordered blocks of closely-locatedobservations. The dependent wild bootstrap uses a well-known property of kernel functionsthat guarantees the positive semi-definiteness of certain weighting matrices. However, asargued in KMS (2020), this relation does not necessarily hold when applied to networks.Finally, the homogeneity assumption of the spatial smoothed bootstrap and the spatialsubsampling, that is the invariance of joint distributions under spatial shifts is not suitablefor our case.We propose two bootstrap approaches for constructing asymptotically valid confidencesets for the mean of a network dependent process and establish the first-order consistencyof these methods for smooth functions of means conditionally on the common shock. Thefirst approach is a block-based method in which blocks are constructed from certain neigh-borhoods of each node in a network. The second is a modification of the dependent wildbootstrap that employs the topology of a given graph to generate random weights insteadof using a fixed kernel function. In addition, we provide the bootstrap variance estima-tors of the scaled sample mean which yield positive semi-definite estimates and can beused as an alternative to the network HAC estimator. We find that the consistency of themodified dependent wild bootstrap and the corresponding variance estimator holds underweaker conditions as compared with the block bootstrap. However, the bootstrap distri-bution corresponding to the former method may fail to match the higher-order cumulantsof the underlying data generating process, thus preventing improvements over asymptoticapproximations.The rest of the paper is organized as follows. The next section describes a modificationof network dependent processes allowing for weighted networks. This modification canbe useful for handling dense graphs once varying intensity of links is assumed. Section3 provides some general result regarding the conditional bootstrap. Specifically, we usethe almost sure convergence of probability kernels to ensure that the bootstrap is valid for(almost) every realization of the common shock, which may also represent the stochasticnetwork formation process. In Section 4 we present the above-mentioned bootstrap methodsin detail and establish sufficient conditions for their conditional consistency. All the proofsand other technical details are presented in the Appendices A-C.
2. The Setup
We consider a variation of network dependent processes characterized in KMS (2020).Namely, let G ≡ ( N, E ) be an undirected graph (possibly infinite), where N is the set ofnodes and E denotes the set of links (we identify N with integers { , , . . . } ). Each edge e ∈ E is associated with a weight W ( e ) ∈ ¯ R . Also let the function d : N × N → ¯ R ≥ bea distance on G ; for example, the shortest path distance for an unweighted graph. An X -valued network dependent process Y ≡ ( Y, G ) is a collection of X -valued random elementsdefined on a common probability space indexed by N , i.e., { Y i : i ∈ N } . The network G governs the stochastic dependence between random elements. In this paper we consider X = R v with v ≥ { ( Y n , G n ) } defined on a common probability space (Ω , H , P ), where each G n ≡ ( N n , E n ) is a finitegraph of size m n → ∞ as n → ∞ ; w.l.o.g. we set m n = n . Here, the sequence { G n } canbe a sequence of subgraphs of an infinite network ( N ∞ , E ∞ ). In general, however, thesegraphs can be unrelated. In order to emphasize the dependence of the distance betweentwo nodes on n , we denote it as d n ( · , · ). Additionally, since the distance function mayimplicitly depend on weights associated with the edges of a graph, we impose the followingrestriction in order to employ the results established in KMS (2020) with the least possiblechange. Assumption 2.1.
For all n ≥
1, min i,j ∈ N n d n ( i, j ) ≥ d n ( i, j ) = ∞ whenever i, j ∈ N n are disconnected (i.e., there is no path connecting i and j ).For example, if W ( e ) ∈ [0 ,
1] for all e ∈ E , which can be interpreted as the intensity oflinks, then the shortest weighed distance associated with 1 /W ( · ) satisfies this assumption,where implicitly we set 1 / ≡ ∞ . In this case an unweighted network ( N, E ) is equivalentto a complete graph (
N, E (cid:48) ), where for e ∈ E (cid:48) , W ( e ) = { e ∈ E } . In a similar manner,the (at most countable) parameter space of a random field on a metric space ( Z , ρ ) can bemodelled as a compete graph of suitable cardinality, where W ( x ↔ y ) is a function of thedistance ρ between two points x, y ∈ Z . Then Assumption 2.1 corresponds to the case ofincreasing domain asymptotics (see, e.g., Conley, 1999; Jenish and Prucha, 2009).Let C ⊂ H be a given sub- σ -field. We assume that the sequence of network dependentprocesses is conditionally weakly dependent given C . Specifically, for a, b ∈ N and s ≥ P n ( a, b ; s ) := { ( A, B ) ⊂ N n : | A | = a, | B | = b, d n ( A, B ) ≥ s } with d n ( A, B ) := min i ∈ A,j ∈ B d n ( i, j ) and let L v be the family of real-valued, bounded,Lipschitz functions, i.e., L v := (cid:91) a ≥ L v,a , where L v,a := { f : R v × a → R : (cid:107) f (cid:107) ∞ < ∞ , Lip( f ) < ∞} . The functions in L v,a are Lipschitz with respect to the distance δ a on R v × a given by δ a ( x , y ) := a (cid:88) l =1 (cid:107) x l − y l (cid:107) , where (cid:107) · (cid:107) is a norm on R v and x ≡ ( x , . . . , x a ) and y ≡ ( y , . . . , y a ) are points in R v × a .In addition, for a set of nodes A ⊂ N n we write Y n,A ≡ { Y n,i : i ∈ A } . Definition 2.1.
A sequence { ( Y n , G n ) } is ( L v , ψ, C )-weakly dependent if for each n ≥ C -measurable sequence γ n ≡ { γ n,s } ∞ s =1 and a collection of nonrandom functions( ψ a,b ) a,b ∈ N , ψ a,b : L a × L b → R ≥ such that for any ( A, B ) ∈ P n ( a, b ; s ) with s ≥ f ∈ L v,a and g ∈ L v,b ,(2.1) | Cov( f ( Y n,A ) , g ( Y n,B ) | C ) | ≤ ψ a,b ( f, g ) γ n, (cid:98) s (cid:99) a.s. Remark. (a) When it is clear from the context, we denote such a sequence as { Y n } omit-ting the reference to the underlying networks. (b) ( L v , ψ ) ≡ ( L v , ψ, {∅ , Ω } ). (c) Theelements of { γ n,s } are called the weak-dependence coefficients associated with { Y n } . (d)For convenience, we set γ n, ≡ L v , ψ, C )-weakly de-pendent are given in KMS (2020). For instance, strong mixing processes correspond to ψ a,b ( f, g ) = 4 (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ . Also associated and Gaussian processes and their certain deriva-tives are ( L v , ψ, C )-weakly dependent with ψ a,b ( f, g ) = ab Lip( f ) Lip( g ). It is worth men-tioning that the corresponding weak dependence coefficients may depend on the topologyof the underlying networks.Conditioning on a σ -field C can be useful in various cases. First, if the underlying graphsare realizations of a stochastic network formation process, then one can potentially condi-tion on the σ -field generated by that process and treat the observed graphs as fixed. Second,fixing nodes with high degree centrality may help to obtain local stochastic dependence. Example 2.1.
Consider a set independent random variables { ε i : i ∈ N } and let C ⊂ N denote a set of nodes with “high” degree centrality (for clarity, we omitted the subscript n ).Then u N \ C , where u i := ε i + (cid:80) j ∈ C β ij ε j and β ij ∈ R , are conditionally independent given C = σ ( ε C ). Moreover, for arbitrary measurable functions { φ i } the process { Y i := φ i ( u N ) } satisfies the covariance bound (2.1) with ψ a,b ( f, g ) = a (cid:107) g (cid:107) ∞ Lip( f ) + b (cid:107) f (cid:107) ∞ Lip( g ). In the context of social interaction models { u i } and { Y i } may represent idiosyncratic shocks andobservable outcomes, respectively.In order to facilitate the exposition, throughout the paper we consider a sequence ofnetwork dependent processes { Y n } satisfying the covariance bound (2.1) with a specificform of the function ψ a,b and bounded weak dependence coefficients. The restricted ψ a,b function is fairly general and covers many useful examples of weakly dependent processes. Assumption 2.2. { ( Y n , G n ) } is ( L v , ψ, C )-weakly dependent and there exist constants M ≥ C > γ n,s ≤ M a.s. for all n, s ≥
1, and ψ a,b ( f, g ) = c (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ + c Lip( f ) (cid:107) g (cid:107) ∞ + c (cid:107) f (cid:107) ∞ Lip( g ) + c Lip( f ) Lip( g ) , where c , . . . , c ≤ Cab .It should be noted that processes satisfying Assumption 2.2 possess some hereditaryproperties. Specifically, if { Y n } is ( L v , ψ, C )-weakly dependent with the weak dependencecoefficients { γ n,s } , then for any Lipschitz function h : R v → R w the sequence { h ( Y n,i ) : i ∈ N n } is ( L w , ψ, C )-weakly dependent with the same weak dependence coefficients. Moreover,this type of weak dependence is preserved under some locally Lipschitz functions as shownin Proposition 2.1 below, which is an extension of Proposition 2.1. in Dedecker, Doukhan,and Lang (2007) to our settings. Proposition 2.1.
Suppose that { Y n } satisfies Assumption 2.2 and there exist L < ∞ and p > such that sup n,i ∈ N n E [ (cid:107) Y n,i (cid:107) p ∞ | C ] ≤ L a.s. Let h : R v → R w be such that (2.2) (cid:107) h ( x ) − h ( y ) (cid:107) ≤ η (cid:107) x − y (cid:107) (cid:0) (cid:107) x (cid:107) τ − + (cid:107) y (cid:107) τ − (cid:1) for some η > and τ ∈ [1 , p ) . Then { h ( Y n,i ) : i ∈ N n } is ( L w , ψ, C ) -weakly dependent withthe weak dependence coefficients γ (cid:48) n,s = KM γ rn,s , where K is a constants depending on η , v , and L and r = ( p − τ ) / ( p − , if c = 0 , ( p − τ ) / ( p + τ − , otherwise . Remark.
The boundedness of the conditional moments of (cid:107) Y n,i (cid:107) ∞ is required in order tomaintain Assumption 2.2. Once this condition is relaxed, it suffices to assume that thesemoments are a.s. finite. Introducing weighted networks is useful in several scenar-ios. First, as we have already mentioned it allows incorporating some additional random processes into the current framework. Second, assuming varying intensity of connectionsenables one to handle denser networks in the sense of the total number of links. Finally,some commonly used statistical models explicitly use weights and can be adapted to ourframework, e.g., the spatial Cliff-Ord-type linear model in Kelejian and Prucha (2010).
Example 2.2.
For each n ≥
1, let u n be a n × W n be an n × n matrix which is a function of weights associated with a given network.Consider a linear model with disturbances following the next autoregressive process: ε n = λ ˜ W n ε n + u n , | λ | < , Typically the original weighting matrix is modified to ensure that the spectral radius of ˜ W n is bounded by 1. Under certain restrictions on the denseness of underlying networks, theprocess { ε n } is weakly dependent with ψ a,b ( f, g ) = a (cid:107) g (cid:107) ∞ Lip( f ) + b (cid:107) f (cid:107) ∞ Lip( g ) so thatthe model can be accommodated within the current framework.Assume that C n := ( I − λ ˜ W n ) − exists for each n ≥ µ := sup n,i ∈ N n E | u n,i | < ∞ .Then ε n = C n u n and, letting ε ( s ) n,i := (cid:80) j ∈ N n : d n ( i,j ) i ∈ N n , i.e, N n ( i ; s ) := { j ∈ N n : d n ( i, j ) < s } , and let N ∂n ( i ; s ) := N n ( i ; s + 1) \ N n ( i ; s ). In addition, we define the following aggregatemeasures of the network denseness:(2.3) δ n ( s ; k ) := n − (cid:88) i ∈ N n | N n ( i ; s + 1) | k , δ ∂n ( s ; k ) := n − (cid:88) i ∈ N n | N ∂n ( i ; s ) | k ,D n ( s ) := max i ∈ N n | N n ( i ; s + 1) | , and D ∂n ( s ) := max i ∈ N n | N ∂n ( i ; s ) | . It is straightforward to see that under Assumption 2.1, which restricts the minimumdistance between any two nodes of a network, the asymptotic results derived in KMS(2020) remain valid once we replace their measures of network denseness with those given Note that this definition of the open neighborhood of a node differs from one commonly used in graphtheory. in (2.3) and redefine H n ( s, m ) as follows:(2.4) H n ( s, m ) := (cid:26) ( i, j, k, l ) ∈ N n : j ∈ N n ( i ; , m + 1) , l ∈ N n ( k ; m + 1) , (cid:98) d n ( { i, j } , { k, l } ) (cid:99) = s (cid:27) . In the case of random networks, however, the measures of network denseness are alsorandom. Therefore, one needs a conditional version of the Law of Large Numbers inorder to be able to condition on the common shock C . Note that the other result arestated in the conditional form and can be directly applied to this case if we assume certainmeasurability conditions. Let D ( G n ) denote the distance matrix associated with G n , i.e.,[ D ( G n )] ij = d n ( i, j ). If D ( G n ) is C -measurable, then N n ( i ; s ) = (cid:80) j ∈ N n { [ D ( G n )] ij < s } isalso C -measurable as well as the quantities given in (2.3) and (2.4). We make the followingassumption: Assumption 2.3.
The distance matrix D ( G n ) is C -measurable for all n ≥ Definition 2.2.
Let
F ⊂ H and let f : Y × Ω → R ≥ be such that f ( y, · ) is F -measurablefor all y ∈ Y . A sequence of such functions { f n } is asymptotically negligible (a.n.), if foralmost all ω ∈ Ω, ess inf s ∈ S lim sup n →∞ f n ( s, ω ) = 0 . In particular, an array of random vectors { X n,i } is– F - asymptotically tight if max i P ( (cid:107) X n,i (cid:107) > y | F ) is a.n.– F - asymptotically uniformly integrable (u.i.) if max i E [ (cid:107) X n,i (cid:107) {(cid:107) X n,i (cid:107) > y } | F ] is a.n. Theorem 2.1 (Conditional Weak Law of Large Numbers) . Let { ( Y n , G n ) } be ( L v , ψ, C ) -weakly dependent satisfying Assumption 2.1, 2.2, and 2.3. Suppose that { Y n } is C -asympto-tically u.i. and n (cid:88) s ≥ δ ∂n ( s ; 1) γ n,s → a.s.Then (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) i ∈ N n ( Y n,i − E [ Y n,i | C ]) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , → a.s. The essential infimum of an arbitrary family of random variables { Z α : α ∈ A} a random variable Z suchthat (a) Z ≤ Z α a.s. for all α ∈ A and (b) Z ≥ Z (cid:48) a.s. for any random variable Z (cid:48) satisfying Z (cid:48) ≤ Z α a.s.for all α ∈ A . In particular, there exists a sequence { α n } such that Z = inf n Z α n a.s. (see, e.g., Cohenand Elliott, 2015, Theorem 1.3.40). The essential supremum is defined similarly and the following identityholds: ess sup α { Z α } = − ess inf α {− Z α } . For a random vector X and F ⊂ H we write (cid:107) X (cid:107) F ,p ≡ E [ (cid:107) X (cid:107) p | F ] /p . Remark.
Similarly to the unconditional case a sufficient condition for the C -asymptoticuniform integrability of { Y n } is the a.s. finiteness of sup n,i ∈ N n E [ (cid:107) Y n,i (cid:107) p | C ] for some p > Y n := n − (cid:80) i ∈ N n Y n,i and Σ n := Var( √ n ¯ Y n | C ). Then the network HACestimator of Σ n ,(2.5) ˆΣ n = 1 n (cid:88) i,j ∈ N n κ (cid:18) d n ( i, j ) b n + 1 (cid:19) ( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) , where κ : ¯ R → [ − ,
1] is a kernel function satisfying: κ (0) = 1, κ ( z ) = κ ( − z ), and κ ( z ) = 0for | z | > b n is the lag truncation parameter, is consistent under the same set ofassumptions. Unfortunately, due to the irregularity of a network’s structure, this estimatoris not guaranteed to be positive semi-definite. However, once the minimal eigenvalue of Σ n is a.s. bounded from below or it converges to an a.s. positive definite matrix, a simple wayto fix this issue is available. The details are given in Appendix B.
3. Conditional Bootstrap
In this section we present some general result regarding the conditional bootstrap. Thelatter is useful for an inference which is asymptotically valid for almost all ω ∈ Ω (or almostall realizations of the common shock). These results do not depend on the underlying datagenerating process. However, we use the present framework for convenience.Suppose that { ( Y n , G n ) } is a sequence of network dependent processes. For a given n ≥ θ n be a C -measurable parameter taking values in Θ ⊆ R w with w ≥ T n ( θ n ) := T n ( Y n , θ n ; ϑ n ) , where T n is a measurable, real-valued function and ϑ n is a C -measurable nuisance param-eter, denote a statistic used to conduct inference on θ n based on a realization of ( Y n , G n )conditionally on C .Let F n denote the conditional cdf of T n given C . The goal of this section is to providesufficient conditions for the conditional first-order consistency of resampling estimators of F n . Specifically, let G n := C ∨ σ ( Y n ) and let Y ∗ n be a pseudo-sample drawn using a realizationof Y n . Then the bootstrap counterpart of T n is T ∗ n := T m ( Y ∗ n , θ ∗ n ), where m is the size of Y ∗ n and θ ∗ n ≡ θ ∗ n ( Y n ) is an estimator of θ n . The conditional cdf F ∗ n of T ∗ n is used as anapproximation of F n . If the latter explicitly depends on the nuisance parameter ϑ n , thenone needs to provide its consistent estimator based on both Y n and Y ∗ n . A (regular) conditional cdf F F X of X ∈ R given F ⊂ H satisfies: (i) ∀ x ∈ R , F F X ( · , x ) is a version of P ( X ≤ x | F ), and (ii) ∀ ω ∈ Ω, F F X ( ω, · ) is a distribution function. We omit the subscript X or thesuperscript F whenever clear from the context. A typical way of showing the consistency of the bootstrap estimators is bounding theKolmogorov distance between the cdfs of T n and T ∗ n (see, e.g., Shao and Tu, 1995, Chap-ter 3). For random variables X and Y and sub- σ -fields F ⊂ G ⊂ H the conditional versionof the latter is defined by d K ( X, Y | G , F ) := sup x ∈ R (cid:12)(cid:12) F G X ( · , x ) − F F Y ( · , x ) (cid:12)(cid:12) , where F G X and F F Y are the conditional cdfs of X and Y , respectively (when F = G wedenote this measure by d K ( X, Y | F )). In addition, we define the conditional convergencein probability and the almost sure convergence of conditional distributions. Definition 3.1.
Let
F ⊂ H be a sub- σ -field and let Z be a F -measurable random vectorin R v with v ≥
1. A sequence of R v -valued random vectors Z n F− p −−→ Z a.s. if for any (cid:15) > P ( (cid:107) Z n − Z (cid:107) > (cid:15) | F ) → Definition 3.2.
Suppose that { X n } is a sequence of random vectors on (Ω , H , P ) and F ⊂ H . Let Q n be the conditional distribution of X n given F . We say that X n converges F -weakly to X having the conditional distribution Q if for almost all ω ∈ Ω the sequence { Q n ( ω, · ) } converges weakly to Q ( ω, · ). Remark. (a) Equivalently, the F -weak convergence can be defined using the notion ofprobability kernels. So the limiting random vector X is an artificial construct which is usedto describe the limiting kernel. (b) A more general notion of the almost sure convergence ofconditional probability measures and some of its properties are presented in Berti, Pratelli,and Rigo (2006). (c) The notion of the F -weak convergence is stronger than the F -stableconvergence and the usual weak convergence. In particular, if X n → X F -weakly, then forany real-valued, bounded, continuous function f , E [ f ( X n ) | F ] → E [ f ( X ) | F ] a.s., whichimplies that X n converges to X F -stably and in distribution.Assume for a moment that C = {∅ , Ω } . Then if there exists a sequence of randomvariables { S n } such that d K ( T n , S n | C ) converges to 0 as n → ∞ and d K ( T ∗ n , S n | G n , C )converges to 0 a.s. (in probability), then the bootstrap estimator is first-order strongly(weakly) consistent. Moreover, if S n converges weakly to a continuous limit, then theconditional quantiles of F ∗ n are a good approximation to those of F n . This typically happens Note that d K ( · , · | G , F ) is G -measurable because { Z x } , where Z x := (cid:12)(cid:12) F G X ( · , x ) − F F Y ( · , x ) (cid:12)(cid:12) , is a c´adl´agstochastic process). A (regular) conditional distribution Q F X of X ∈ R v given F ⊂ H satisfies: (i) ∀ B ∈ B ( R v ), Q F X ( · , B ) is aversion of P ( X ∈ B | F ) and (ii) ∀ ω ∈ Ω, Q F X ( ω, · ) is a probability measure on ( R , B ( R v )). We omit thesubscript X or the superscript F whenever clear from the context. Note that this implies convergence in probability due to the dominated convergence theorem. In addition,for an a.s. positive F -measurable random variable ν , P ( (cid:107) Z n − Z (cid:107) > ν | F ) → when the statistic T n is pivotal. However, in the case of a non-pivotal statistic, which isuseful when a consistent estimator of ϑ n is hard to obtain or the available estimatorshave poor finite sample properties, the cdfs of { T n } need not converge. In this case, theconvergence of the Kolmogorov distance between T ∗ n and T n to zero does not necessarilyimply that F n ( c ∗ n ( α )) → α as n → ∞ , where c ∗ n ( α ) is the conditional α -quantile of F ∗ n .Nevertheless, as shown in the next result, a sufficient condition for the latter to happen isthe continuity of the cdfs of { S n } . Theorem 3.1.
Suppose that for all n ≥ , S n is conditionally independent of Y n given C and the conditional cdf of S n given C is ( a.s. ) continuous. Then if (a) d K ( T n , S n | C ) → a.s. and (b) d K ( T ∗ n , S n | G n , C ) C− p −−→ a.s., d K ( T ∗ n , T n | G n , C ) C− p −−→ a.s. and ess sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α ) | C ) − α | → a.s. Remark. (a) Usually when C = {∅ , Ω } and the statistic T n is pivotal, we have S n = S ∞ ,which is the weak limit of T n . (b) A variant of this result can be found in Chernozhukovet al. (2013) in the context of Gaussian multiplier bootstrap. (c) Theorem 3.1 also impliesthat the conditional quantiles { c ∗ n ( α ) : α ∈ (0 , } approximate the unconditional quantilesof T n because, by the dominated convergence theorem,sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α )) − α | ≤ E ess sup α ∈ (0 , | P ( T n ≤ c ∗ n ( α ) | C ) − α | → . Definition 3.3.
We say that F ∗ n is conditionally d K -consistent given C if the conclusion ofTheorem 3.1 holds.Typically it is not hard to show that condition (a) of Theorem 3.1 holds (for example,when the elements of Y ∗ n are conditionally i.i.d. given G n ). On contrary, establishing (b) maybe a difficult task, especially when T n is a nonlinear transformation of Y n in the presence ofstochastic dependence between its elements as in the current framework. However, in thecase when the statistic T n converges C -weakly to S and the limiting kernel (i.e., the regularconditional cdf of S given C ) is continuous, Lemma C.4 implies that this convergence isequivalent to one with respect to the conditional Kolmogorov distance. In addition, byLemma C.3 the almost sure convergence of conditional distributions enjoys a number ofuseful properties associated with the usual weak convergence such as the continuous map-ping theorem, converging together lemma, and the Cram´er–Wold device. In this situationwe have the following simple corollary. Consider, for example, the case of the linearized statistic T (cid:48) n given in (3.1). It does not have a nondegen-erate weak limit when the sequence of parameters { θ n } is not convergent. Corollary 3.1.
Suppose that S is conditionally independent of { Y n } given C and the con-ditional cdf of S given C is ( a.s. ) continuous. Then if (a) T n → S C -weakly and (b) d K ( T ∗ n , S | G n , C ) C− p −−→ a.s., F ∗ n is conditionally d K -consistent given C . Next, we consider the case in which the statistic T n takes the following form: T n ( θ n ) = τ n (cid:16) φ (ˆ θ n ) − φ ( θ n ) (cid:17) , where φ : Θ → R is a continuously differentiable function, ˆ θ n is a consistent estimator of θ n (in the sense of Definition 3.1) and τ n is a normalizing coefficient. In particular, the smoothfunction model (see, e.g., Lahiri, 2003, Section 4.2 and Hall, 1992, Section 2.4) falls intothis case. The resampling version of the statistic T n is T ∗ n = τ ∗ n (cid:16) φ (ˆ θ ∗ n ) − φ ( θ ∗ n ) (cid:17) , where θ ∗ n is a consistent estimator of θ n , which may differ from ˆ θ n , and τ ∗ n is the bootstrapcounterpart of τ n . Let ξ n := τ n (ˆ θ n − θ n ) and ξ ∗ n := τ ∗ n (ˆ θ ∗ n − θ ∗ n ). Consider the linearizedstatistics(3.1) T (cid:48) n := ∇ φ ( θ n ) (cid:62) ζ n and T (cid:48)∗ n := ∇ φ ( θ ∗ n ) (cid:62) ξ ∗ n . The following result shows that it suffices to find a “smooth” approximation S (cid:48) n of thelinearized statistics in order to apply Theorem 3.1 to this setup. In particular, the resultlargely depends on the asymptotic behavior of the conditional L´evy concentration functionof S (cid:48) n . For a random variable X , (cid:15) >
0, and a sub- σ -field F ⊂ H the latter is given by Q ( (cid:15), X | F ) := sup x ∈ R (cid:0) F F X ( · , x + (cid:15) ) − F F X ( · , x − ) (cid:1) . Lemma 3.1.
Suppose that ˆ θ ∗ n − θ ∗ n C− p −−→ a.s., ξ ∗ n and ξ n are C -asymptotically tight and sup n (cid:107) θ n (cid:107) < ∞ a.s. Furthermore, assume that (a) d K ( T (cid:48) n , S (cid:48) n | C ) → a.s., (b) d K ( T (cid:48)∗ n , S (cid:48) n | G n , C ) C− p −−→ a.s. and (c) Q ( (cid:15), S (cid:48) n | C ) is a.n.Then w.p.1, d K ( T ∗ n , S (cid:48) n | G n , C ) C− p −−→ and d K ( T n , S (cid:48) n | C ) → . Consequently, the continuity of the conditional cdfs of { S (cid:48) n } ensures the bootstrap con-sistency in the sense of Definition 3.3. Theorem 3.2.
Suppose that the conditions of Lemma 3.1 hold and, in addition, { S n } satisfy the independence and continuity conditions of Theorem 3.1. Then F ∗ n is conditionally d K -consistent given C . Similarly to the general case, when ξ n converges C -weakly to some random vector ξ and the sequence of parameters { θ n } converges a.s. to a C -measurable random variable θ ,Lemma C.3 implies that T (cid:48) n converges C -weakly to ∇ φ ( θ ) (cid:62) ξ . In addition, if the conditionalcdf of the latter is (a.s.) continuous, it satisfies assumption (b) of Lemma 3.1. Corollary 3.2.
Suppose that ˆ θ ∗ n − θ ∗ n C− p −−→ a.s., ξ ∗ n is C -asymptotically tight and S (cid:48) := ∇ φ ( θ ) (cid:62) ξ satisfies the independence and continuity conditions of Corollary 3.1. Then if (a) ξ n → ξ C -weakly, (b) θ n → θ a.s. and (c) d K ( T (cid:48)∗ n , S (cid:48) | G n , C ) C− p −−→ a.s., F ∗ n is conditionally d K -consistent given C . Remark.
The assumption regarding convergence of the sequence of parameters { θ n } canbe relaxed. In the unconditional case it suffices to assume that sup n (cid:107) θ n (cid:107) < ∞ . Thenone needs to provide a uniform bound on | P ( ξ n ∈ A ) − P ( ξ ∈ A ) | , where A ranges over theclass of half-spaces for a network dependent process similar to that established in Bentkus(2003). The conditioning on C complicates the problem even more so it falls out of thescope of this paper.
4. Bootstrap of the Mean
Consider a sequence of network dependent processes { ( Y n , G n ) } satisfying Assumptions2.1, 2.2, and 2.3. As an application of the results given in the preceding section, weconsider the mean of a Y n , µ n ≡ E [ Y n,i | C ] which may vary with n but not across i ∈ N n . The parameter of interest µ n is estimated using the sample mean ¯ Y n which is a consistentestimator of µ n under the assumptions of Theorem 2.1. In this section we provide a numberof resampling based methods for constructing the asymptotically valid confidence sets for µ n . In addition, we establish consistency of a restricted version of the smooth function model in which we are interested in φ ( µ n ) for a continuously differential function φ : R v → R .When the elements of Y n have the same marginal conditional distributions given C , we mayconsider φ ( E [ f ( Y n, ) | C ]), where f : R v → R w is a locally Lipschitz function satisfying(2.2) and the domain of φ is R w in this case. Since the process { f ( Y n,i ) : i ∈ N n } is It is possible to extend the results of this paper to the case of heterogeneous means as in Gon¸calves andWhite (2002). However, such a setup makes it difficult to isolate the effect of the structure of underlyingnetworks on the consistency of the proposed bootstrap methods. ( L w , ψ, C )-weakly dependent by Proposition 2.1, without loss of generality we examine thefirst version. In addition, we provide consistent positive semi-definite estimators of Σ n .The corresponding test statistics are given by T ,n ( µ n ) = √ n (cid:107) ¯ Y n − µ n (cid:107) , and T ,n ( µ n ) = √ n (cid:0) φ ( ¯ Y n ) − φ ( µ n ) (cid:1) , (4.1)where (cid:107) · (cid:107) is the Euclidean norm on R v . Their conditional distributions given C are denotedby F ,n and F ,n , respectively, and the bootstrap approximations of these distributions aredenoted by F ∗ ,n and F ∗ ,n . The confidence sets for µ n are obtained by test inversion, i.e CS n, − α := (cid:8) µ ∈ R v : T ,n ( µ ) ≤ c ∗ n, − α (cid:9) , where c ∗ n,α ( ω ) := inf { x : F ∗ ,n ( ω, x ) ≥ α } is the conditional α -quantile of F ∗ ,n . In practice,if the exact distribution of T ∗ j,n , j = 1 , First, we suggest a variant of the block bootstrap, whichis extensively studied in the time-series and spatial literature. Specifically, we choosethe maximal block radius s n > n overlapping blocks { B n, , . . . , B n,n } with B n,k := N n ( k ; s n + 1). That is B n,k is an ( s n + 1) open neighborhood of the node k . Thenwe randomly select K n := (cid:98) n/δ n ( s n ) (cid:99) blocks { B ∗ n, , . . . , B ∗ n,K n } with replacement (note that δ n ( s n ) is the average block size) which yields a bootstrap sample Y ∗ n = (cid:8) Y n,B ∗ n,k : 1 ≤ k ≤ K n (cid:9) . Formally, let { u , . . . , u K n } be i.i.d. U { , n } random variables defined on (Ω , H , P ) andindependent of G n . Then the k -th resampled block is defined as B ∗ n,k = B n,u k and, therefore,for 1 ≤ k ≤ K n and 1 ≤ l ≤ n , P (cid:0) B ∗ n,k = B n,l | G n (cid:1) = n − a.s. For the ease of expositionwe assume that n/δ n ( s n ) is an integer.The size of the bootstrap sample L n := (cid:80) K n k =1 (cid:12)(cid:12) B ∗ n,k (cid:12)(cid:12) is random conditional on the dataand depends on the distribution of | N n ( · ; s n ) | given the network G n . However, on averageit is expected to be close to n , (in fact, the conditional expectation of L n given C is exactly n ). Also in the time series case this approach reduces to a variant of the moving blocksbootstrap with unequally sized blocks such that blocks located near the endpoints havesmaller size.Let Z ∗ n,k := (cid:80) j ∈ B ∗ n,k Y n,j and let ˜ Y ∗ n := n − (cid:80) K n k =1 Z ∗ n,k be the quasi-average of the boot-strap sample Y ∗ n , which replaces the sample average in the bootstrap versions of T ,n and T ,n . We could also consider the true average of a pseudo-sample, i.e., ¯ Y ∗ n := L − n (cid:80) K n k =1 Z ∗ n,k . However, L n is not independent of the blocks sums and, as mentioned before, its distribu-tion depends on the underlying network topology. As a result, it is relatively difficult tofind a “smooth” approximation of the distribution of √ L n ¯ Y ∗ n which guarantees the first-order consistency of the bootstrap (in particular, the suggested resampling scheme may notbe appropriate in this case). In addition, since the conditional expectation of ˜ Y n given G n differs from the sample average, we replace the true parameter µ n with µ ∗ n := E [ ˜ Y ∗ n | G n ].As indicated in Lahiri (1992) in the time-series context, replacing µ n with ¯ Y n introduces anadditional bias which does not allow for second-order improvements over the normal ap-proximation (see also Lahiri, 2003, Section 2.7.1). The BB counterparts of the test statisticsin (4.1) are given by T ∗ ,n = √ n (cid:107) ˜ Y ∗ n − µ ∗ n (cid:107) , and T ∗ ,n = √ n (cid:0) φ ( ˜ Y ∗ n ) − φ ( µ ∗ n ) (cid:1) . The conditional variance of the scaled sample mean Σ n can be estimated using the boot-strap version Σ ∗ n ≡ Var( √ n ˜ Y ∗ n | G n ). Since { Z ∗ n, , . . . , Z ∗ n,K n } are conditionally independentgiven G n , Σ ∗ n = 1 δ n ( s n ) (cid:32) n (cid:88) i ∈ N n Z n,i Z (cid:62) n,i − ¯ Z n ¯ Z (cid:62) n (cid:33) a.s. , where Z n,i := (cid:80) j ∈ B n,j Y n,j and ¯ Z n := n − (cid:80) i ∈ N n Z n,i . By construction the matrix Σ ∗ n ispositive semidefinite and its form is similar to the network HAC estimator (2.5). To seethis let(4.2) ω n ( i, j ) := | N n ( i ; s n + 1) ∩ N n ( j ; s n + 1) | δ n ( s n )(when i = j we denote this quantity by ω n ( i )). ThenΣ ∗ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j )( Y n,i − µ n )( Y n,j − µ n ) (cid:62) + R n a.s. , where E [ (cid:107) R n (cid:107) F | C ] → µ n = 0 a.s., the remainder term R n = 0 a.s. Unlike a typical kernel, theweighting functions ω n ( · , · ) depends on the network topology and it is not bounded by1. However, for fixed i ∈ N n it is decreasing in the distance between i and j . Let ˜ ω :=sup n max i (cid:54) = j ω n ( i, j ), ˜ µ p := sup n,i ∈ N n (cid:107) Y n,i (cid:107) C ,p for p >
0, and(4.3) ∆ n ( s ; k ) := 1 n (cid:88) i ∈ N n || N n ( i ; s + 1) | − δ n ( s ) | k , which is the k -th absolute central moment of the sizes of the ( s + 1)-neighborhoods. Thefollowing assumptions provide sufficient conditions for the consistency of Σ ∗ n . Assumption 4.1.
The sequence { ( G n , s n ) } is such that w.p.1 ˜ ω < ∞ and(a) ∆ n ( s n ; 2) /δ n ( s n ) + D n ( s n ) / (cid:112) δ n ( s n ) n → i ∈ N n (cid:12)(cid:12)(cid:12)(cid:80) j ∈ B n,i ( ω n ( j ) − (cid:12)(cid:12)(cid:12) / √ n → n − (cid:80) i ∈ N n (cid:80) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | γ n,s → s ≥ ∗ n requires a certain degree of homogeneity of the resampled blocks whichis characterized by various moments of the weights { ω n ( i, j ) : i, j ∈ N n } . For example,condition ( a ) requires that the sample variance of {| B n,i |} increases at a lower rate thanthe average block size. It also guarantees that µ ∗ n is a consistent estimator of the mean µ n and that for large samples the size of a pseudo-sample, L n is close to n . In fact, E [ | L n /n − | | C ] → E [ | L n /n − | | C ] = 1 n E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K n (cid:88) k =1 (cid:0)(cid:12)(cid:12) B ∗ n,k (cid:12)(cid:12) − δ n ( s n ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | C (cid:35) ≤ n K n (cid:88) k =1 ∆ n ( s n ; 1) ≤ (cid:112) ∆ n ( s n ; 2) δ n ( s n ) a.s.This condition is clearly satisfied in the time series context when s n = o ( √ n ) (although,it has been shown that the consistency of the moving block bootstrap in this case holdsfor s n = o ( n ) (see, e.g., Calhoun, 2018)). However, it does not hold for unweighted “star”networks and s n ≡ n (1; 2) ≥ [∆ n (1; 1)] → δ n (1) → n → ∞ . Inpractice, one can compute ∆ n ( s n ; 2) for a given graph to see whether this quantity is smallrelative to the average block size.Condition (c) ensures that all the non-zero autocovariances are estimated consistently.It is similar to an assumption on kernel functions used in HAC estimation, that is in thelimit the value of a kernel at each s must converge to 1. In addition, if ˜ γ s > s ≥
1, then the parameter s n must go to infinity for this condition tohold. Assumption 4.2.
There exists r > µ r < ∞ and(a) lim sup n →∞ (cid:80) s ≥ δ ∂n ( s ) γ − r n,s < ∞ ,(b) n − (cid:80) s ≥ | H n ( s, s n + 1) | γ − r n,s → s n (see KMS, 2020, Section 4.1). Also condition (a)implies that the elements of the true variance Σ n do not diverge to ±∞ . To see this note that for 1 ≤ k, l ≤ v and some constant C > | [Σ n ] kl | ≤ C (˜ µ r ∨ (cid:32) (cid:88) s ≥ δ ∂n ( s ) γ − r n,s (cid:33) a.s.Therefore, lim sup n ≥ | [Σ n ] kl | < ∞ a.s. Proposition 4.1.
Suppose that Assumptions 4.1 and 4.2 hold. Then E [ (cid:107) Σ ∗ n − Σ n (cid:107) F | C ] → a.s. The result of Proposition 4.1 implies that Σ ∗ n is a consistent estimator of Σ n . Therefore,assuming that Σ n → Σ a.s. and √ n ( ¯ Y n − µ n ) converges C -weakly to a conditionally normalrandom vector with variance Σ, we may use Corollaries 3.1 and 3.2 to establish the consis-tency of the bootstrap distributions. For example, one may employ Theorem 3.2 in KMS(2020) together with the Cram´er–Wold device and Lemma C.4. Assumption 4.3. Σ n converges a.s. to a C -measurable, positive definite matrix Σ, and √ n ( ¯ Y n − µ n ) → Σ / η C -weakly , where η ∼ N (0 , I v ) independent of C . In addition, we introduce the local versions of some measures of the network denseness.Specifically, for s, m ≥ δ ∂loc,n ( s, m ) := max i ∈ N n (cid:88) j ∈ N n ( i ; m ) (cid:12)(cid:12) N ∂n ( j ; s ) ∩ N n ( i ; m ) (cid:12)(cid:12) | N n ( i ; m ) | and h loc,n ( s, m ) := max i ∈ N n | H n ( s, ∞ ) ∩ N n ( i ; m ) || N n ( i ; m ) | . These measures are constructed in a way such that for any m ≥ δ ∂loc,n (0 , m ) = h loc,n (0 , m ) =1. Also note that h loc,n ( s, m ) ≤ δ ∂loc,n ( s, m ). Assumption 4.4.
There exists p > µ p < ∞ and( δ n ( s n ) /n ) / (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s + (cid:0) δ / n ( s n ) /n (cid:1) / (cid:88) s ≥ h loc,n ( s, s n ) γ − p n,s → . Assumption 4.3 is made merely for ease of exposition. In view of Theorem 3.1 it can be omitted at theexpense of establishing additional Berry-Essen type bounds. When the following summability condition holds:lim sup n →∞ (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s < ∞ a.s. , Assumption 4.3 reduces to δ / n ( s n ) /n → n ≥ { B n,k } have the same size l n < n , it suffices to assume that the weak dependence coefficientsraised to the power 1 − /p are a.s. summable and l n = o ( n / ). Note that this assumptionalso explicitly requires K n → ∞ as n → ∞ . Proposition 4.2.
Suppose that Assumptions 4.1-4.4 hold. Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistent given C . The dependent wild bootstrap for time-series was introduced in Shao (2010). This method approximates the finite-sample distribu-tion of T n by mimicking the autocovariance structure of the underlying sample. In particu-lar, adapting to our framework, assume that C = {∅ , Ω } and let G n be an unweighted “line”network. Consider an n -dimensional, zero mean random vector W n defined on (Ω , H , P )and independent of Y n such that Var( W n,i ) = 1 and Cov( W n,i , W n,j ) = κ ( d n ( i, j ) / ( s n + 1)),where κ ( · ) is a positive-definite kernel function and s n is a bandwidth parameter. TheDWB pseudo-sample Y ∗ n is defined as follows: Y ∗ n,i = ¯ Y n + ( Y n,i − ¯ Y n ) W n,i , i ∈ N n . Let ¯ Y ∗ n := n − (cid:80) i ∈ N n Y ∗ n,i . By construction, E [ ¯ Y ∗ n | G n ] = ¯ Y n so that in contrast to theblock bootstrap, the statistic √ n ( ¯ Y ∗ n − ¯ Y n ) is unbiased given G n . In addition, noticing that κ (0) = 1, the conditional variance of the scaled bootstrap mean given G n isΣ ∗ n = 1 n (cid:88) i,j ∈ N n Cov( W n,i , W n,j )( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) = 1 n (cid:88) i,j ∈ N n κ (cid:18) d n ( i, j ) s n + 1 (cid:19) ( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) , which is a version of the network HAC estimator (2.5). Then under certain regularityconditions the DWB is first-order consistent for smooth functions of the mean.For general graphs, however, positive definiteness of the kernel function κ does not implythat the matrix [ κ ( d n ( i, j ) / ( s n + 1))] i,j ∈ N n is positive semi-definite (see KMS, 2020, Section4.1). Therefore, in general, we cannot guarantee the existence of a random vector withthe required covariance structure. A simple way to overcome this issue is to rely on the topology of a given network. Consider the matrix Ω n = [ ω n ( i, j )] i,j ∈ N n , where ω n is definedin (4.2). Claim 4.1. Ω n is positive semi-definite.Proof. Let c ∈ R n and ξ i := (cid:80) j ∈ N n ( i ; s n +1) c j . Then, since ( j, k ) ∈ N n ( i ; s n + 1) if and onlyif i ∈ N n ( j ; s n + 1) ∩ N n ( k ; s n + 1), (cid:88) i ∈ N n ξ i = (cid:88) i ∈ N n (cid:88) j,k ∈ N n ( i ; s n +1) c j c k = (cid:88) i,j ∈ N n c i c j ω n ( i, j ) δ n ( s n ) , Therefore, c (cid:62) Ω n c = (cid:88) i,j ∈ N n c i c j ω n ( i, j ) = (cid:88) i ∈ N n ξ i /δ n ( s n ) ≥ . (cid:4) Consequently, we consider a random vector W n satisfying the following assumption. Assumption 4.5. W n is conditionally independent of Y n given C with E [ W n | C ] = 0 a.sand E [ W n W (cid:62) n | C ] = Ω n a.s.Under Assumption 4.5 the bootstrap variance estimator given by(4.4) Σ ∗ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j )( Y n,i − ¯ Y n )( Y n,j − ¯ Y n ) (cid:62) a.s.is positive semi-definite. We impose the next conditions on the sequence of networks, whichin combination with Assumption 4.2, ensure the consistency of Σ ∗ n . Assumption 4.6.
The sequence { ( G n , s n ) } is such that w.p.1 ˜ ω < ∞ and(a) ∆ n ( s n ; 1) /δ n ( s n ) + D n ( s n ) /n → n − (cid:80) i ∈ N n (cid:80) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | γ n,s → s ≥ n ( s n ; 1) ≤ (cid:112) ∆ n ( s n ; 2) and δ n ( s n ) ≤ n . Therefore, the DWB estimator (4.4) is likely to be consistentfor a wider class of networks. As in the case of the block bootstrap we assume that thetrue variance Σ n converges a.s. to a C -measurable matrix Σ. Proposition 4.3.
Suppose that Assumptions 4.6 and 4.2 hold. Then E [ (cid:107) Σ ∗ n − Σ n (cid:107) F | C ] → a.s. First, we consider the Gaussian case. That is, we take W n = Ω / n ζ n , where ζ n is thestandard normal random vector in R n independent of G n . From the practical perspectiveit is a convenient choice, especially when n is large because a sample from a multivariate normal distribution can be easily generated. Moreover, efficient algorithms for finding thesquare root of positive semidefinite matrices are available. We refer to Higham (2008)for details. As noted in Shao (2010), although the DWB sample with Gaussian weightsmay not match non-zero higher-order cumulants of the original process, it is difficult tochoose the joint distribution of W n that fits those cumulants, and performance of the DWBprimarily depends on the choice of the truncation parameter s n .In this case, conditionally on G n , the statistic √ n ( ¯ Y ∗ n − ¯ Y n ) = 1 √ n (cid:88) i ∈ N n W n,i ( Y n,i − ¯ Y n )is also normal with zero mean and variance given in (4.4). Therefore, the conditionaldistribution of the DWB counterpart of the test statistic T ,n , T ∗ ,n = √ n (cid:13)(cid:13) ¯ Y n ∗ − ¯ Y n (cid:13)(cid:13) given G n is known and is the same as the conditional distribution of the asymptotic Gauss-ian approximation (cid:107) Σ ∗ / n η (cid:107) , where η is as v -dimensional standard normal random vectorindependent of G n , and the latter converges C -weakly to (cid:107) Σ / η (cid:107) by Lemma C.3. A moreinteresting case, however, arises when considering the second test statistic T ,n because fornonlinear transformations the conditional distribution of its bootstrap analog, T ∗ ,n = √ n (cid:0) φ ( ¯ Y n ∗ ) − φ ( ¯ Y n ) (cid:1) , is typically unavailable. Then in the Gaussian case the DWB is consistent without anyfurther restriction on the topology of the sequence of networks { G n } . We only need toassume that √ n ( ¯ Y n − µ n ) converges C -weakly to a conditionally normal random vector andthe asymptotic variance of T ,n is a.s. positive. Proposition 4.4.
Suppose that W n is Gaussian, Assumptions 4.5, 4.6, 4.2, and 4.3 hold.Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistentgiven C . Given another choice of W n , the process { ξ n,i := W n,i ( Y n,i − ¯ Y n ) : i ∈ N n } is s n -dependentconditionally on G n , i.e., ξ n,i and ξ n,j are conditionally independent given G n whenever j / ∈ B n,i := N n ( i ; s n + 1). Consequently, in addition to the assumptions of Proposition 4.4, weneed to control the behavior of the third conditional moments of W n and the neighborhoods { B n,i } such that the bootstrap distributions F ∗ ,n and F ∗ ,n in this case approach ones underthe Gaussian weights as n → ∞ . Proposition 4.5.
Suppose that Assumptions 4.5, 4.6, 4.2, and 4.3 hold, and (4.5) 1 n / (cid:88) i ∈ N n (cid:88) j ∈ B n,i (cid:88) k ∈ B n,i ∪ B n,j (cid:89) l ∈{ i,j,k } (cid:107) W n,l (cid:107) C , → a.s.Then F ∗ ,n is conditionally d K -consistent given C . If, in addition, µ n converges a.s. to a C -measurable random vector µ and ∇ φ ( µ ) (cid:54) = 0 a.s., then F ∗ ,n is conditionally d K -consistentgiven C . Remark.
The convergence condition in (4.5) is quite strong. In particular, in a sim-ple case when the neighborhoods { B n,i } have the same size l n < n for all n ≥ n,i ∈ N n (cid:107) W n,i (cid:107) C , < ∞ a.s., it requires l n = o ( n / ). Therefore, it is of high interest tofind a better way to handle network dependent processes under m -dependence.
5. Conclusion
Nonparametric bootstrapping for time series and spatial processes has been extensivelystudied in the past decades. Thus, various resampling methods are now available for sta-tistical analysis of dependent data in these cases. However, the lack of regular structurein networks renders the use of these techniques for bootstrap-based inference in the caseof network dependent processes impracticable. In this paper we proposed a block-basedmethod and a variant of the dependent wild bootstrap suitable for the latter processessatisfying the conditional version of Doukhan and Louhichi (1999)’s ψ -weak dependencecondition. We established the first-order validity of these methods to construct confidencesets for the mean of a network dependent process. In addition, we showed their consistencyunder the smooth function model conditionally on a common shock of a general form. Fi-nally, the corresponding bootstrap variance estimators can be used for asymptotic inferenceinstead of the network HAC estimator, which is not necessarily positive semi-definite.As for the future directions, having a continuity theorem and other related results similarto ones established in Belyaev and Sj¨ostedt-de Luna (2000) but under convergence in condi-tional probability would significantly weaken the bootstrap consistency conditions derivedin this paper. In addition, an extension of these methods for bootstrapping M -estimatorsand empirical processes is of great importance for applied research. References
Athreya, K. B., Lahiri, S. N., 2006. Measure Theory and Probability Theory (SpringerTexts in Statistics). Springer-Verlag, Berlin, Heidelberg.Belyaev, Y., Sj¨ostedt-de Luna, S., 2000. Weakly approaching sequences of random distri-butions. Journal of Applied Probability 37 (3), 807—-822.Bentkus, V., 2003. On the dependence of the Berry-Esseen bound on dimension. Journalof Statistical Planning and Inference 113, 385––402.Berti, P., Pratelli, L., Rigo, P., 2006. Almost sure weak convergence of random probabilitymeasures. Stochastics 78 (2), 91–97.B¨uhlmann, P. L., 1993. The blockwise bootstrap in time series and empirical processes.Ph.D. thesis, Swiss Federal Institute of Technology Z¨urich.Calhoun, G., 2018. Block bootstrap consistency under weak assumptions. EconometricTheory.Carlstein, E., 1986. The use of subseries values for estimating the variance of a generalstatistic from a stationary sequence. The Annals of Statistics 14 (3), 1171–1179.Chernozhukov, V., Chetverikov, D., Kato, K., 2013. Gaussian approximations and mul-tiplier bootstrap for maxima of sums of high-dimensional random vectors. Annals ofStatistics 41 (6), 2786–2819.Cohen, S., Elliott, R. J., 2015. Stochastic Calculus and Applications, 2nd Edition. Proba-bility and Its Applications. Birkh¨auser Basel.Comets, F., Janˇzura, M., 1998. A central limit theorem for conditionally centered randomfields with an application to Markov fields. Journal of Applied Probability 35, 608–621.Conley, T. G., 1999. GMM estimation with cross-sectional dependence. Journal of Econo-metrics 92, 1–45.Crimaldi, I., 2009. An almost sure conditional convergence result and an application to ageneralized P´olya urn. International Mathematical Forum 4 (23), 1139–1156.Dedecker, J., Doukhan, P., Lang, G., 2007. Weak Dependence: With Examples and Appli-cations. Lecture Notes in Statistics. Springer, New York.Doukhan, P., Louhichi, S., 1999. A new weak dependence condition and applications tomoment inequalities. Stochastic Processes and their Applications 84 (2), 313–342.Durrett, R., 2010. Probability: Theory and Examples, 4th Edition. Cambridge UniversityPress.Embrechts, P., Hofert, M., 2013. A note on generalized inverses. Mathematical Methods ofOperations Research 77 (3), 423–432.Garcia-Soidan, P., Menezes, R., Rubinos, O., 2014. Bootstrap approaches for spatial data.Stochastic Environmental Research and Risk Assessment 28 (5), 1207–1219. Gon¸calves, S., Politis, D., 2011. Discussion: Bootstrap methods for dependent data: Areview. Journal of the Korean Statistical Society 40 (4), 383–386.Gon¸calves, S., White, H., 2002. The bootstrap of the mean for dependent heterogeneousarrays. Econometric Theory 18 (6), 1367–1384.Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer Series in Statistics.Springer-Verlag, New York, Berlin, Paris.Higham, N., 1988. Computing a nearest symmetric positive semidefinite matrix. LinearAlgebra and its Applications 103, 103–118.Higham, N., 2002. Computing the nearest correlation matrix - a problem from finance.IMA Journal of Numerical Analysis 22 (3), 329–343.Higham, N. J., 2008. Functions of Matrices: Theory and Computation. Society for Indus-trial and Applied Mathematics, Philadelphia, PA, USA.Jenish, N., Prucha, I. R., 2009. Central limit theorems and uniform laws of large numbersfor arrays of random fields. J. Econometrics 150 (1), 86–98.Kallenberg, O., 2002. Foundations of Modern Probability, 2nd Edition. Probability and itsApplications (New York). Springer-Verlag, New York.Kelejian, H. H., Prucha, I. R., 2007. HAC estimation in a spatial framework. Journal ofEconometrics, 131–154.Kelejian, H. H., Prucha, I. R., 2010. Specification and estimation of spatial autoregressivemodels with autoregressive and heteroskedastic disturbances. Journal of Econometrics157 (1), 53–67, nonlinear and Nonparametric Methods in Econometrics.K¨unsch, H. R., 1989. The jackknife and the bootstrap for general stationary observations.The Annals of Statistics 17, 1217–1241.Kojevnikov, D., Marmer, V., Song, K., 2020. Limit theorems for network dependent randomvariables. Journal of Econometrics.URL https://doi.org/10.1016/j.jeconom.2020.05.019
Lahiri, S. N., 1992. Edgeworth correction by ’moving block’ bootstrap for stationary andnonstationary data. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap.John Wiley & Sons, New York; Chichester, pp. 183–214.Lahiri, S. N., 2003. Resampling Methods for Dependent Data. Springer Series in Statistics.Liu, R. Y., Singh, K., 1992. Moving blocks jackknife and bootstrap capture weak depen-dence. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap. John Wiley& Sons, New York; Chichester, pp. 225–248.Matyas, L., 1999. Generalized Method of Moments Estimation. Cambridge UniversityPress, Cambridge.Paparoditis, E., Politis, D. N., 2001. Tapered block bootstrap. Biometrika 88 (4), 1105–1119. Politis, D. N., Romano, J. P., 1992. A circular block-resampling procedure for stationarydata. In: LePage, R., Billard, L. (Eds.), Exploring the limits of bootstrap. John Wiley& Sons, New York; Chichester, pp. 263–270.Politis, D. N., Romano, J. P., 1994. The stationary bootstrap. Journal of the AmericanStatistical Association 89 (428), 1303–1313.Politis, D. N., Romano, J. P., Wolf, M., 1999. Subsampling. Springer.Rhee, W., Talagrand, M., 1986. Uniform bound in the central limit theorem for Banachspace valued dependent random variables. Journal of Multivariate Analysis 20 (2), 303–320.R¨ollin, A., 2013. Stein’s method in high dimensions with applications. Ann. Inst. H.Poincar´e Probab. Statist. 49 (2), 529–549.Sengupta, S., Shao, X., Wang, Y., 2015. The dependent random weighting. Journal of TimeSeries Analysis 36 (3), 315–326.Shao, J., Tu, D., 1995. The Jackknife and Bootstrap. Springer-Verlag, Berlin; New York.Shao, X., 2010. The dependent wild bootstrap. Journal of the American Statistical Asso-ciation 105 (489), 218–235.Shiryaev, A. N., 2016. Probability-1, 3rd Edition. Graduate Texts in Mathematics.Springer-Verlag New York.Talagrand, M., 2011. Mean Field Models for Spin Glasses, Volume I: Basic Examples.Vol. 54 of A Series of Modern Surveys in Mathematics. Springer-Verlag.Zhang, F., 2011. Matrix Theory: Basic Results and Techniques, 2nd Edition. Universitext.Springer-Verlag New York. Appendix A. Proofs of the Main Results
In the following let ϕ K with K ∈ R + denote the element-wise censoring function, i.e.,for an indexed family of real numbers x ≡ ( x i ) i ∈ I ,[ ϕ K ( x )] i := ( − K ) ∨ ( K ∧ x i ) , i ∈ I. Proof of Proporsition 2.1 . Fix κ ≥ ξ := ( f ◦ h )( Z n,A ) and ζ := ( g ◦ h )( Z n,B ),where f, g ∈ L w and ( A, B ) ∈ P n ( a, b ; s ). Define the censored versions ξ κ := ( f ◦ h ◦ ϕ κ )( Z n,A ) and ζ κ := ( g ◦ h ◦ ϕ κ )( Z n,B ) . Then | Cov( ξ, ζ | C ) | ≤ | Cov( ξ − ξ κ , ζ − ζ κ | C ) | + | Cov( ξ κ , ζ κ | C ) |≤ (cid:107) f (cid:107) ∞ E [ | ζ − ζ κ | | C ] + 2 (cid:107) g (cid:107) ∞ E [ | ξ − ξ κ | | C ]+ | Cov( ξ κ , ζ κ | C ) | a.s.First, Lip( f ◦ h ◦ ϕ κ ) ≤ ηκ τ − Lip( f ) and Lip( g ◦ h ◦ ϕ κ ) ≤ ηκ τ − Lip( g ). Therefore, | Cov( ξ κ , ζ κ | C ) | ≤ (cid:0) c (cid:107) f (cid:107) ∞ (cid:107) g (cid:107) ∞ + 2 ηκ τ − { c Lip( f ) (cid:107) g (cid:107) ∞ + c (cid:107) f (cid:107) ∞ Lip( g ) } + (2 ηκ τ − ) c Lip( f ) Lip( g ) (cid:1) γ n,s a.s.(A.1)Second, E [ | ξ − ξ κ | | C ] ≤ Lip( f ) (cid:88) i ∈ A E [ (cid:107) h ( Z n,i ) − ( h ◦ ϕ κ )( Z n,i ) (cid:107) | C ] ≤ C v Lip( f ) (cid:88) i ∈ A E [ (cid:107) Z n,i (cid:107) τ ∞ {(cid:107) Z n,i (cid:107) ∞ > κ } | C ] ≤ C v Lip( f ) aLκ τ − p a.s. , (A.2)where C v > v and η . Similarly, E [ | ζ − ζ κ | | C ] ≤ C v Lip( g ) bLκ τ − p a.s.(A.3)Since inequalities (A.1)-(A.3) hold for all κ ≥ κ on { κ ∈ [1 , ∞ ) } . The result follows by setting κ = ( γ n,s ∧ / (1 − p ) , if c = 0 and κ =( γ n,s ∧ / (2 − p − τ ) , otherwise, and, noticing that Cov( ξ, ζ | C ) = 0 a.s. on { γ n,s = 0 } . (cid:4) Proof of Theorem 2.1 . First, it suffices to show that (cid:13)(cid:13) c (cid:62) ( ¯ Y n − E [ ¯ Y n | C ]) (cid:13)(cid:13) C , → for any c ∈ R v with (cid:107) c (cid:107) = 1. Then the proof is similar to one given in the unconditionalcase. Specifically, for k >
0, let ξ ( k ) n,i := ϕ k ( c (cid:62) Y n,i ) and ζ ( k ) n,i := c (cid:62) Y n,i − ξ ( k ) n,i so that (cid:13)(cid:13) c (cid:62) ( ¯ Y n − E [ ¯ Y n | C ]) (cid:13)(cid:13) C , ≤ i ∈ N n (cid:13)(cid:13)(cid:13) ζ ( k ) n,i (cid:13)(cid:13)(cid:13) C , + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − (cid:88) i ∈ N n (cid:16) ξ ( k ) n,i − E [ ξ ( k ) n,i | C ] (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , a.s.The result then follows from the definition of the essential infimum and the followinginequalities: (cid:13)(cid:13)(cid:13) ζ ( k ) n,i (cid:13)(cid:13)(cid:13) C , ≤ E [ (cid:107) Y n,i (cid:107) {(cid:107) Y n,i (cid:107) > k } | C ] a.s.and, since ψ , ( ϕ k , ϕ k ) ≤ Ck , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) i ∈ N n (cid:16) ξ ( k ) n,i − E [ ξ ( k ) n,i | C ] (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C , ≤ √ nk (cid:32) C (cid:88) s ≥ δ ∂n ( s ; 1) θ n,s (cid:33) / a.s. (cid:4) Proof of Theorem 3.1 . The first assertion follows trivially from the triangle inequality.Consider the second assertion. First, note that for a sub- σ -algebra F ⊂ H and randomvariables
X, Y and F -measurable random variable Z , P ( X ≤ Z | F ) = F F X ( · , Z ) a.s. and P ( Y ≤ Z | F ) = F F Y ( · , Z ) a.s. (see, e.g., Kallenberg, 2002, Theorem 5.4). Therefore, | P ( X ≤ Z | F ) − P ( Y ≤ Z | F ) | ≤ d K ( X, Y | F ) a.s.In addition, if F = σ ( A∪B ), where A and B are sub- σ -algebras of H , and Y is conditionallyindependent of A given B , then d K ( X, Y | F ) = d K ( X, Y | F , B ) a.s.Let c n ( α ) denote the conditional α -quantile of S n given C . Fix η > α ± η ∈ (0 ,
1) and let ∆ n ≡ d K ( T ∗ n , S n | G n , C ). Then, using the properties of generalized inverses(see, e.g., Embrechts and Hofert, 2013, Proposition 1) and the conditional independence of S n and Y n given C , we get P ( S n ≤ c ∗ n ( α + η ) | G n ) ≥ P ( T ∗ n ≤ c ∗ n ( α + η ) | G n ) − η ≥ α = P ( S n ≤ c n ( α ) | G n ) a.s. on { ∆ n ≤ η } and P ( T ∗ n ≤ c n ( α + 2 η ) | G n ) ≥ P ( S n ≤ c n ( α + 2 η ) | G n ) − η = α + η ≥ P ( T ∗ n ≤ c ∗ n ( α ) | G n ) a.s. on { ∆ n ≤ η } . Therefore, P ( c ∗ n ( α ) ≥ c n ( α − η ) | C ) ≥ P ( c ∗ n ( α ) ≥ c n ( α − η ) , ∆ n ≤ η | C )= P (∆ n ≤ η | C ) a.s.and P ( c ∗ n ( α ) ≤ c n ( α + 2 η ) | C ) ≥ P ( c ∗ n ( α ) ≤ c n ( α + 2 η ) , ∆ n ≤ η | C )= P (∆ n ≤ η | C ) a.s.Using the last two inequalities we find that P ( c n ( α ) ∧ c ∗ n ( α ) < T n ≤ c n ( α ) ∨ c ∗ n ( α ) | C ) ≤ P ( c n ( α − η ) < T n ≤ c n ( α + 2 η ) | C )+ P ( c n ( α − η ) > c ∗ n ( α ) | C ) + P ( c n ( α + 2 η ) < c ∗ n ( α ) | C ) ≤ P ( c n ( α − η ) < S n ≤ c n ( α + 2 η ) | C )+ 2 P (∆ n > η | C ) + 2 d K ( T n , S n | C )= 3 η + 2 P (∆ n > η | C ) + 2 d K ( T n , S n | C ) a.s.and A n,α := | P ( T n ≤ c ∗ n ( α ) | C ) − α |≤ η + 2 P (∆ n > η | C ) + 3 d K ( T n , S n | C ) a.s.(A.4)Finally, there exists a sequence { α k } such that ess sup α ∈ (0 , A n,α = sup k A n,α k a.s. and thelatter is a.s. bounded by the RHS of (A.4). Therefore,lim sup n →∞ (cid:16) ess sup α ∈ (0 , A n,α (cid:17) ≤ η a.s.and the result follows by considering a sequence η m (cid:38) (cid:4) Proof of Lemma 3.1 . By the mean value theorem, we may write T n = ∇ φ (˜ θ n ) (cid:62) τ n (ˆ θ n − θ n ) and T ∗ n = ∇ φ (˜ θ ∗ n ) (cid:62) τ ∗ n (ˆ θ ∗ n − θ ∗ n ) , (A.5)where ˜ θ n and ˜ θ ∗ n are such that (cid:107) ˜ θ n − θ n (cid:107) ≤ (cid:107) ˆ θ n − θ n (cid:107) and (cid:107) ˜ θ ∗ n − θ ∗ n (cid:107) ≤ (cid:107) ˆ θ ∗ n − θ ∗ n (cid:107) . Then forany r ∈ R and (cid:15) > | P ( T ∗ n ≤ r | G n ) − P ( T (cid:48)∗ n ≤ r | G n ) |≤ P ( T (cid:48)∗ n ≤ r + R ∗ n | G n ) − P ( T (cid:48)∗ n ≤ r − R ∗ n | G n ) ≤ d K ( T (cid:48)∗ n , S (cid:48) n | G n , C ) + Q ( S (cid:48) n , (cid:15) | C ) + P ( R ∗ n > (cid:15) | G n ) a.s. , where R ∗ n ≡ (cid:12)(cid:12)(cid:12) ( ∇ φ (˜ θ ∗ n ) − ∇ φ ( θ ∗ n )) (cid:62) τ ∗ n (ˆ θ ∗ n − θ ∗ n ) (cid:12)(cid:12)(cid:12) . Similarly, for any r ∈ R and (cid:15) > | P ( T n ≤ r | C ) − P ( T (cid:48) n ≤ r | C ) |≤ d K ( T (cid:48) n , S (cid:48) n | C ) + Q ( S (cid:48) n , (cid:15) | C ) + P ( R n > (cid:15) | C ) a.s. , where R n ≡ (cid:12)(cid:12)(cid:12) ( ∇ φ (˜ θ n ) − ∇ φ ( θ n )) (cid:62) τ n (ˆ θ n − θ n ) (cid:12)(cid:12)(cid:12) . By Lemma C.6 the sequence { θ ∗ n } is C -asymptotically tight. Therefore, using LemmaC.5 together with the C -asymptotic tightness of τ ∗ n (ˆ θ ∗ n − θ ∗ n ) and τ n (ˆ θ n − θ n ) it follows that P ( R ∗ n > (cid:15) | C ) → P ( R n > (cid:15) | C ) → ν > n →∞ P ( d K ( T ∗ n , T (cid:48)∗ n | G n ) > ν | C ) ≤ ν − ess inf (cid:15)> lim sup n →∞ Q ( S (cid:48) n , (cid:15) | C ) = 0 a.s.and lim sup n →∞ d K ( T n , T (cid:48) n | C ) ≤ ess inf (cid:15)> lim sup n →∞ Q ( S (cid:48) n , (cid:15) | C ) = 0 a.s.The result then follows from the triangle inequality. (cid:4) Proof of Theorem 3.2 . Follows immediately from Lemma 3.1 and Theorem 3.1. (cid:4)
Proof of Corollary 3.2 . Consider Equation (A.5) in the proof of Lemma 3.1. By LemmaC.3 T n converges C -weakly to S (cid:48) ( ∵ ˜ θ n C− p −−→ θ a.s. and x (cid:55)→ ∇ φ ( x ) is continuous). Hence, d K ( T n , S (cid:48) | C ) → d K ( T ∗ n , S (cid:48) | G n , C ) follows fromarguments similar to those given in the proof of Lemma 3.1. Finally, the result holds byTheorem 3.1. (cid:4) Proof of Proposition 4.1 . Let ζ n,i := (cid:80) j ∈ B n,i Y (cid:48) n,i , where Y (cid:48) n,i := Y n,i − µ n,i , and let ζ ∗ n,i be its resampling version. Then, using the conditional independence of the elements of { ζ ∗ n,i } given G n , we find that˜Σ n := Var (cid:32) √ n K n (cid:88) k =1 ζ ∗ n,k | G n (cid:33) = 1 n K n (cid:88) k =1 Var( ζ ∗ n,k | G n )= 1 δ n ( s n ) (cid:32) n (cid:88) i ∈ N n ζ n,i ζ (cid:62) n,i − ¯ ζ n ¯ ζ (cid:62) n (cid:33) = Σ ∗ n − n (cid:88) i ∈ N n ( ω n ( i ) − ζ n,i µ (cid:62) n + µ n ζ (cid:62) n,i ) − ∆ n ( s n ; 2) δ n ( s n ) µ n µ (cid:62) n ≡ Σ ∗ n − A n, + A n, , (A.6)where ¯ ζ n := n − (cid:80) i ∈ N n ζ n,i . On the other hand, using the second line of (A.6),˜Σ n = 1 n (cid:88) i,j ∈ N n ω n ( i, j ) Y (cid:48) n,i Y (cid:48)(cid:62) n,j − δ n ( s n ) × n (cid:88) i ∈ N n ω n ( i ) Y (cid:48) n,i × n (cid:88) i ∈ N n ω n ( i ) Y (cid:48)(cid:62) n,i ≡ B n, + B n, . (A.7)Let y (cid:48) n,i := c (cid:62) Y (cid:48) n,i and µ (cid:48) n := c (cid:62) µ n , where c ∈ R v with (cid:107) c (cid:107) = 1 and note that by Lemma C.1it suffices to show that E [ | c (cid:62) (Σ ∗ n − Σ n ) c | | C ] → { y (cid:48) n } is ( L , ψ, C )-weakly dependent with the weak dependence coefficients { γ n } .In the following let Ξ n := (cid:88) s ≥ δ ∂n ( s ) γ − r n,s . Claim A.1. E [ | c (cid:62) ( ˜Σ n − Σ n ) c | | C ] → a.s.Proof. Consider the first term on the last line of (A.7). Write c (cid:62) ( B n, − Σ n ) c = 1 n (cid:88) i ∈ N n ( y (cid:48) n,i − E [ y (cid:48) n,i | C ]) + 1 n (cid:88) i ∈ N n ( ω n ( i ) − y (cid:48) n,i + 1 n (cid:88) i ∈ N n (cid:88) j ∈ N n \{ i } ω n ( i, j )( y (cid:48) n,i y (cid:48) n,j − E [ y (cid:48) n,i y (cid:48) n,j | C ])+ 1 n (cid:88) i ∈ N n (cid:88) j ∈ N n \{ i } ( ω n ( i, j ) − E [ y (cid:48) n,i y (cid:48) n,j | C ] ≡ R n, + R n, + R n, + R n, . Using the covariance inequalities established in KMS (2020),(A.8) | R n, | ≤ C (cid:88) s ≥ γ − r n,s × n (cid:88) i ∈ N n (cid:88) j ∈ N ∂n ( i ; s ) | ω n ( i, j ) − | a.s. , where C = C ( µ r ∨
1) for some constant C ≥
1. Since ˜ ω < ∞ a.s., the RHS of (A.8)is bounded by C (˜ ω + 1)Ξ n < ∞ a.s. Therefore, by the dominated convergence theorem | R n, | → w n,i,j := y (cid:48) n,i y (cid:48) n,j − E [ y (cid:48) n,i y (cid:48) n,j | C ], we find that E [ R n, | C ] ≤ ¯ ω n (cid:88) i,j ∈ N n ≤ d n ( i,j ) < s n +1) (cid:88) k,l ∈ N n ≤ d n ( k,l ) < s n +1) | E [ w n,i,j w n,k,l | C ] |≤ C ¯ ω n (cid:88) s ≥ | H n ( s, s n + 1) | γ − r n,s → , where C = C ( µ r ∨
1) for some constant C ≥
1. Finally, E [ | R n, | | C ] ≤ µ r n (cid:88) i ∈ N n | ω n ( i ) − | → E [ R n, | C ] ≤ n (cid:88) i,j ∈ N n (cid:12)(cid:12) Cov( y (cid:48) n,i , y (cid:48) n,j | C ) (cid:12)(cid:12) ≤ C n (1 + Ξ n ) → , where C = C ( µ r ∨
1) for some constant C ≥ c (cid:62) B n, c ≥ E [ c (cid:62) B n, c | C ] ≤ ( D n ( s n )) δ n ( s n ) n (cid:88) i,j ∈ N n (cid:12)(cid:12) E [ y (cid:48) n,i y (cid:48) n,j | C ] (cid:12)(cid:12) ≤ ( D n ( s n )) C δ n ( s n ) n (1 + Ξ n ) → (cid:3) Consider the last two terms on the last line of equation (A.6). Let α n,i := (cid:88) j ∈ B n,i ( ω n ( j ) −
1) and α n := max i ∈ N n | α n,i | . Then, since (cid:88) i ∈ N n ( ω n ( i ) − ζ n,i = (cid:88) i ∈ N n α n,i Y (cid:48) n,i , we have c (cid:62) A n, c = 2 n − (cid:80) i ∈ N n α n,i y (cid:48) n,i µ (cid:48) n and, therefore, E (cid:2) ( c (cid:62) A n, c ) | C (cid:3) ≤ C α n n (1 + Ξ n ) → (cid:107) A n, (cid:107) F ≤ µ r × ∆ n ( s n ; 2) δ n ( s n ) → a.s. (cid:4) Proof of Proposition 4.3 . We use the notation from the proof of Proposition 2.1. Fora vector c ∈ R v with (cid:107) c (cid:107) = 1 we have c (cid:62) (Σ ∗ n − B n, ) c = ¯ y (cid:48) n × δ n ( s n ; 2) δ n ( s n ) − y (cid:48) n n (cid:88) i ∈ N n y (cid:48) n,i (cid:88) j ∈ N n ω n ( i, j ) ≡ Q n, + Q n, , (A.9)where ¯ y n := n − (cid:80) i ∈ N n y (cid:48) n,i . Consider the second term in the last line of (A.9). Letting τ n,i := (cid:80) j ∈ N n ω n ( i, j ) and noticing thatmax i ∈ N n τ n,i ≤ max i ∈ N n ( ω n ( i ) + ˜ ω | N n ( i ; s n + 1) | ) ≤ (˜ ω + 1) D n ( s n ) ≡ τ n , it follows that | Q n, | ≤ |√ τ n ¯ y (cid:48) n | × n √ τ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i ∈ N n τ n,i y (cid:48) n,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≡ |√ τ n ¯ y (cid:48) n | × Q n, . Similarly to the proof of Proposition 4.1, E [ Q n, | C ] ≤ C τ n n (1 + Ξ n ) → E [ τ n ¯ y (cid:48) n | C ] is bounded the same quantity and since δ n ( s n ; 2) /δ n ( s n ) ≤ τ n , E [ Q n, | C ] ≤ E [ τ n ¯ y (cid:48) n | C ] → (cid:4) Proof of Proposition 4.2 . Consider T ∗ ,n first. Let ˜ Y n,i := Y n,i − µ n , ˜ Z n,i := (cid:80) k ∈ B n,i ˜ Y n,k and let ˜ Z ∗ n,i be the bootstrap version of the letter. Also define W ∗ n,i := ˜ Z ∗ n,i − E [ ˜ Z ∗ n,i | G n ].Conditionally on G n , { ξ ∗ n,i } are row-wise i.i.d. random vectors with E [ W ∗ n, | G n ] = 0 andVar( W ∗ n, | G n ) = δ n ( s n )Σ ∗ n a.s. Write √ n ( ˜ Y ∗ n − µ ∗ n ) = 1 √ K n K n (cid:88) k =1 ( δ n ( s n )) − / W ∗ n,k . Then, letting λ ( A ) denote the minimal eigenvalue of a square matrix A , by Corollary C.1, d K (cid:0) T ∗ ,n , S ∗ ,n | G n (cid:1) ≤ C v ( λ (Σ) / / (cid:18) E [ (cid:107) W ∗ n, (cid:107) | G n ] √ nδ n ( s n ) (cid:19) / a.s. on { λ (Σ ∗ n ) ≥ λ (Σ) / } , where S ∗ ,n = (cid:107) Q n (cid:107) and Q n is conditionally normal given G n with zero mean and variance Σ ∗ n . Claim A.2. E [ (cid:107) W ∗ n,i (cid:107) | C ] / ( √ nδ n ( s n )) → a.s. Proof.
It suffices to show that E [ | c (cid:62) W n,i | | C ] / ( √ nδ n ( s n )) → c ∈ R v suchthat (cid:107) c (cid:107) = 1. By the c r -inequality, E [ | c (cid:62) W ∗ n, | | G n ] ≤ E [ | c (cid:62) ˜ Z ∗ n, | | G n ] = 8 n (cid:88) i ∈ N n | c (cid:62) ˜ Z n,i | a.s.Let ˜ y n,i := c (cid:62) ˜ Y n,i . Then E [ | c (cid:62) ˜ Z n,i | | C ] ≤ (cid:88) j ,j ,j ,j ∈ B n,i | Cov(˜ y n,j ˜ y n,j , ˜ y n,j ˜ y n,j | C ) | + (cid:88) j ,j ∈ B n,i | Cov(˜ y n,j , ˜ y n,j | C ) | ≡ A n,i + C n,i a.s.Similarly to the proof of Proposition 4.1, we find that w.p.1, A n,i ≤ C (˜ µ p ∨ | B n,i | (cid:88) s ≥ h loc,n ( s, s n ) γ − p n,s and C n,i ≤ C (˜ µ p ∨ | B n,i | (cid:88) s ≥ δ ∂loc,n ( s, s n ) γ − p n,s , where C and C are some positive constants. The result then follows by noticing that E [ | c (cid:62) ˜ Z n,i | | C ] ≤ ( E [ | c (cid:62) ˜ Z n,i | | C ]) / a.s. and the fact that ( A n,i + C n,i ) / ≤ A / n,i + C / n,i . (cid:3) Using Jensen’s inequality, we find that for any (cid:15) > P (cid:0) d K (cid:0) T ∗ ,n , S ∗ ,n | G n (cid:1) > (cid:15) | C (cid:1) ≤ C v ( λ (Σ) / / (cid:18) E [ (cid:107) W ∗ n, (cid:107) | C ] √ nδ n ( s n ) (cid:19) / + P ( λ (Σ ∗ n ) < λ (Σ) / | C ) → , where the convergence of P ( λ (Σ ∗ n ) < λ (Σ) / | C ) follows from the fact that the eigenval-ues of a matrix depend continuously on the entries of the matrix (see, e.g., Zhang, 2011,Theorem 2.11) so that λ j (Σ ∗ n ) C− p −−→ λ j (Σ) a.s. for all 1 ≤ j ≤ d .Since the eigenvalues of Σ ∗ n converge to the eigenvalues of Σ and the latter are a.s.positive, it follows from Lemma C.9 that d K (cid:0) S ∗ ,n , (cid:107) Σ / η (cid:107) | G n , C (cid:1) C− p −−→ T ,n → (cid:107) Σ / η (cid:107) C -weakly and, hence, the result follows fromCorollary 3.1. We consider the conditional versions of all inequalities used in this proof. Consider the second assertion. First, for any c ∈ R v such that (cid:107) c (cid:107) = 1 and (cid:15) >
0, we get P (cid:16) | c (cid:62) ( ˜ Y n − µ ∗ n ) | > (cid:15) | C (cid:17) ≤ K n (cid:15) ) E (cid:32) K n (cid:88) k =1 c (cid:62) ˜ Z ∗ n,k (cid:33) | C = 1 K n (cid:15) E [ c (cid:62) Σ ∗ n c | C ] → , where the convergence follows from the consistency of Σ ∗ n . Therefore, ˜ Y ∗ n − µ ∗ n C− p −−→ C -asymptotic tightness of a vector follows from that of its elements, √ n ( ˜ Y ∗ n − µ ∗ n ) is C -asymptotically tight due to the same reason (i.e., the convergence of Σ ∗ n ).Write ∇ φ ( µ ∗ n ) (cid:62) √ n ( ˜ Y ∗ n − µ ∗ n ) = 1 √ K n K n (cid:88) k =1 ( δ n ( s n )) − / ∇ φ ( µ ∗ n ) (cid:62) W ∗ n,k . Then by Lemma C.8, letting T (cid:48)∗ ,n = ∇ φ ( µ ∗ n ) (cid:62) √ n ( ˜ Y ∗ n − µ ∗ n ) and S (cid:48)∗ ,n = ∇ φ ( µ ∗ n ) (cid:62) Q n , we have d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) ≤ C ( σ/ × (cid:107)∇ φ ( µ ∗ n ) (cid:107) E [ (cid:107) W ∗ n, (cid:107) | G n ] √ nδ n ( s n )a.s. on { σ ∗ n ≥ σ/ } , where σ ∗ n = ∇ φ ( µ ∗ n ) (cid:62) Σ ∗ n ∇ φ ( µ ∗ n ) and σ = ∇ φ ( µ ) (cid:62) Σ ∇ φ ( µ ). Since x (cid:55)→ ∇ φ ( x ) is continuous and µ ∗ n is a consistent estimator of µ , ∇ φ ( µ ∗ n ) C− p −−→ ∇ φ ( µ ) and σ ∗ n C− p −−→ σ a.s. Consequently, as in the previous case, d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) C− p −−→ d K (cid:0) S (cid:48)∗ ,n , ∇ φ ( µ ) (cid:62) Σ / η | G n , C (cid:1) C− p −−→ (cid:4) Proof of Proposition 4.4 . The proof is similar to one for Proposition 4.2, and so isomitted. (cid:4)
Proof of Proposition 4.5 . By Lemma C.11, d K (cid:0)(cid:13)(cid:13) T ∗ ,n (cid:13)(cid:13) , (cid:13)(cid:13) S ∗ ,n (cid:13)(cid:13) | G n (cid:1) C− p −−→ d K (cid:0) T (cid:48)∗ ,n , S (cid:48)∗ ,n | G n (cid:1) C− p −−→ , where S ∗ ,n = (cid:107) Q n (cid:107) , S (cid:48)∗ ,n = ∇ φ ( ¯ Y n ) (cid:62) Q n and Q n is conditionally normal given G n with zeromean and variance Σ ∗ n . The rest is similar to the proof of Proposition 4.2. (cid:4) Appendix B. Network HAC Estimator
Although the HAC estimator (2.5) is consistent in the sense that ˆΣ n − Σ n C− p −−→ Q n Λ n Q (cid:62) n be the eigendecomposition of ˆΣ n (since ˆΣ n is symmetric all its eigenvaluesare real). Also let λ ( A ) denote the smallest eigenvalue of A , e.g., λ ( ˆΣ n ) = min ≤ k ≤ v Λ n .Consider a sequence of small positive real numbers c n (cid:38)
0. We approximate ˆΣ n byˆ V + n := Q n (Λ n ∨ c n I v ) Q (cid:62) n , where the maximum is taken element-wise. By construction, the matrix ˆΣ + n is positivedefinite. Moreover, in the case when the smallest eigenvalue of Σ n is bounded from belowby some positive constant, it is also a consistent estimator of the true variance as followsfrom the next result. Proposition B.1.
Suppose that ˆΣ n − Σ n C− p −−→ a.s. and there exists a constant c > suchthat P ( λ (Σ n ) ≥ c ev. ) = 1 . Then ˆΣ + n − Σ n C− p −−→ a.s.Proof. Fix (cid:15) >
0. Then P (cid:16) (cid:107) ˆΣ + n − Σ n (cid:107) > (cid:15) | C (cid:17) ≤ P (cid:16) (cid:107) ˆΣ n − Σ n (cid:107) > (cid:15) | C (cid:17) + P (cid:16) λ ( ˆΣ n ) < c n | C (cid:17) a.s.(B.1)The first term on the RHS of (B.1) trivially converges to 0 a.s. As for the second term,using the properties of the Rayleigh quotient, λ ( ˆΣ n ) = min x : (cid:107) x (cid:107) =1 x (cid:62) ˆΣ n x ≥ λ (Σ n ) + λ ( ˆΣ n − Σ n ) . Therefore, noticing that | λ ( A ) | ≤ (cid:107) A (cid:107) , P (cid:16) λ ( ˆΣ n ) < c n | C (cid:17) ≤ P (cid:16) λ ( ˆΣ n − Σ n ) < c n − c | C (cid:17) + { λ (Σ n ) < c } → (cid:4) If Σ n converges a.s. to a positive definite matrix Σ, then we may relax the assumptionsof the preceding result. Proposition B.2.
Suppose that ˆΣ n − Σ C− p −−→ a.s., where Σ is positive definite. Then ˆΣ + n − Σ C− p −−→ a.s.Proof. As in the proof of Proposition B.1 for any (cid:15) > n →∞ P (cid:16) (cid:107) ˆΣ + n − Σ (cid:107) > (cid:15) | C (cid:17) ≤ ess inf c> { λ (Σ) < c } = 0 a.s. (cid:4) Appendix C. Auxiliary Results
In the following we assume that all random elements are defined on a common probabilityspace (Ω , P , H ). Also for a vector x ∈ R v let (cid:107) x (cid:107) denote the Euclidean norm of x and let (cid:107) · (cid:107) e,p be the element-wise p -norm in R a × b , i.e., (cid:107) A (cid:107) e,p := (cid:107) vec( A ) (cid:107) p . Lemma C.1.
Let A n be a sequence of symmetric matrices in R v × v and F ⊂ H . Then thefollowing are equivalent: (a) E [ (cid:107) A n (cid:107) e, | C ] → a.s. (b) E [ (cid:107) A n (cid:107) F | C ] → a.s. (c) E [ | c (cid:62) A n c | | C ] → a.s. for any c ∈ R v such that (cid:107) c (cid:107) = 1 .Proof. (a) is equivalent to ( b ) because (cid:107) A n (cid:107) F ≤ (cid:107) A n (cid:107) e, ≤ v (cid:107) A n (cid:107) F . The equivalence of (a) and (c) follows from the next inequalities: | c (cid:62) A n c | ≤ (cid:107) c (cid:107) ∞ (cid:107) A n (cid:107) e, . and, letting z + ij = ( e i + e j ) / √ z − ij = ( e i − e j ) / √
2, where { e , . . . , e v } is a standardbasis for R v , (cid:107) A n (cid:107) e, ≤ v (cid:88) i,j =1 (cid:0) | z + (cid:62) ij A n z + ij | + | z −(cid:62) ij A n z − ij | (cid:1) . (cid:4) The following is a simple extension of Lemma A.3. in Crimaldi (2009) to the multidimen-sional case. For a random vector X ∈ R v and F ⊂ H let Q F X denote the regular conditionaldistribution of X given F and let ˆ ϕ X be the corresponding characteristic functions, i.e.,for t ∈ R v , ˆ ϕ X ( ω, t ) = (cid:90) exp( it (cid:62) x ) Q F X ( ω, dx ) . Also the conditional characteristic function of X given F is given by ϕ X ( t | F ) := E [exp( it (cid:62) X ) | F ] and for a fixed t ∈ R v and almost all ω ∈ Ω, ˆ ϕ X ( ω, t ) = ϕ X ( t | F )( ω ). Lemma C.2.
Let { X n } be a sequence of random vectors in R v and F ⊂ H . Then X n → X F -weakly, i.e., for almost all ω ∈ Ω , Q F X n ( ω, · ) → Q F X ( ω, · ) weakly, iff for every t ∈ R v , ˆ ϕ X n ( · , t ) → ˆ ϕ X ( · , t ) a.s. The next lemma provides a number of useful properties of the almost sure conditionalconvergence which are typical of the usual weak convergence.
Lemma C.3.
Let { X n } and { Y n } be sequences of random vectors in R v and R w , respec-tively, and F ⊂ H . Then (a) If X n → X F -weakly and g : R v → R d is continuous, then g ( X n ) → g ( X ) F -weakly. (b) X n → X F -weakly iff s (cid:62) X n → t (cid:62) X F -weakly for all s ∈ R v . (c) If Y n F− p −−→ Y a.s., where Y is F -measurable, then Y n → Y F -weakly. (d) Let v = w . If X n → X F -weakly and X n − Y n F− p −−→ a.s., then Y n → X F -weakly. (e) If X n → X F -weakly, Y n F− p −−→ Y a.s., where Y is F -measurable, then ( X (cid:62) n , Y (cid:62) n ) → ( X (cid:62) , Y (cid:62) ) F -weakly.Proof. ( a ) This follows from Lemma C.2 and the fact that x (cid:55)→ exp( it (cid:62) g ( x )) is a bounded,continuous function.(b) The sufficiency follow from part (a) because x (cid:55)→ s (cid:62) x is continuous. For the necessity,suppose that all linear combinations converge F -weakly. Then ϕ X n ( t | F ) = ϕ t (cid:62) X n (1 | F ) → ϕ t (cid:62) X (1 | F ) = ϕ X ( t | F ) a.s.and the result follows from Lemma C.2.(c) Since for any t ∈ R w and (cid:15) > | e it (cid:62) ( Y n − Y ) − | ≤ (cid:15) on | t (cid:62) ( Y n − Y ) | ≤ (cid:15) , we have | ϕ Y n ( t | F ) − ϕ Y ( t | F ) | ≤ E (cid:104)(cid:12)(cid:12)(cid:12) e it (cid:62) ( Y n − Y ) − (cid:12)(cid:12)(cid:12) | F (cid:105) ≤ (cid:15) + P (cid:0) | t (cid:62) ( Y n − Y ) | > (cid:15) | F (cid:1) a.s.Therefore, lim sup n ≥ | ϕ Y n ( t | F ) − ϕ Y ( t | F ) | ≤ (cid:15) a.s.The result follows by considering a sequence (cid:15) m (cid:38) t ∈ R v , | ϕ Y n ( t | F ) − ϕ X ( t | F ) | ≤ E (cid:104)(cid:12)(cid:12)(cid:12) e it (cid:62) ( Y n − X n ) − (cid:12)(cid:12)(cid:12) | F (cid:105) → X (cid:62) n , Y (cid:62) ) → ( X (cid:62) , Y (cid:62) ) F -weakly, the result follows from part (d). (cid:4) Lemma C.4.
Let { X n } be a sequence of random variables, F ⊂ H , and let X be a randomvariable with ( a.s. ) continuous conditional cdf given F ( i.e., the map t (cid:55)→ F F X ( ω, t ) iscontinuous for ( almost ) all ω ∈ Ω) . Then X n → X F -weakly iff d K ( X n , X | F ) → a.s.Proof. The necessity holds by Theorem 3.1.2 in Shiryaev (2016) because d K ( X n , X | F ) → ω ∈ Ω the regular conditional cdfs converge and thesufficiency follows from the ω -wise application of P´olya’s theorem (e.g., Athreya and Lahiri,2006, Theorem 9.1.4). (cid:4) Lemma C.5.
Suppose that f : R v → R w is continuous and { X n } and { Y n } are sequencesof random vectors in R v such that Y n − X n F− p −−→ a.s. for some F ⊂ H and { X n } is F -asymptotically tight. Then f ( Y n ) − f ( X n ) F− p −−→ a.s.Proof. For any z >
0, the restriction f | B (0 ,z ) is uniformly continuous, i.e., ∀ (cid:15) > ∃ δ (cid:15) > x, y ∈ B (0 , z ), (cid:107) f ( x ) − f ( y ) (cid:107) < (cid:15) whenever (cid:107) x − y (cid:107) < δ (cid:15) . Fix (cid:15) >
0. Then P ( (cid:107) f ( Y n ) − f ( X n ) (cid:107) > (cid:15) | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > δ (cid:15) | F )+ P ( (cid:107) Y n (cid:107) > x | F ) + P ( (cid:107) X n (cid:107) > z | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > δ (cid:15) ∧ z/ | F )+ 2 P ( (cid:107) X n (cid:107) > z/ | F ) a.s.Therefore, lim sup n →∞ P ( (cid:107) f ( Y n ) − f ( X n ) (cid:107) > (cid:15) | F ) ≤ z> lim sup n →∞ P ( (cid:107) X n (cid:107) > z | F ) = 0 a.s. (cid:4) Lemma C.6.
Suppose that { X n } and { Y n } are sequences of random vectors in R v suchthat X n is F -measurable for all n ≥ and some F ⊂ H , sup n (cid:107) X n (cid:107) < ∞ a.s., and Y n − X n F− p −−→ a.s. Then { Y n } is F -asymptotically tight.Proof. For any y > P ( (cid:107) Y n (cid:107) > y | F ) ≤ P ( (cid:107) Y n − X n (cid:107) > y/ | F )+ (cid:8) sup n (cid:107) X n (cid:107) > y/ (cid:9) a.s.Therefore, ess inf y> lim sup n →∞ P ( (cid:107) Y n (cid:107) > y | F ) ≤ ess inf y> (cid:8) sup n (cid:107) X n (cid:107) > y (cid:9) = 0 a.s. (cid:4) In the following, for r, (cid:15) ≥ S r,(cid:15) := { x ∈ R v : r ≤ (cid:107) x (cid:107) ≤ r + (cid:15) } . Lemma C.7.
Suppose that Z is a standard normal random vector in R v with v ≥ and λ ≥ λ ≥ · · · ≥ λ v > are constants. Let Λ := diag( λ , . . . , λ v ) and N = Λ / Z . Then forall (cid:15) ≥ , r ≥ , Q ( (cid:15), (cid:107) N (cid:107) ) = sup r ≥ P (cid:0) N ∈ S r,(cid:15) (cid:1) ≤ C d (cid:15) √ λ , where C v ≡ √ v − .Proof. Let X := (cid:80) i =1 N i , Y := (cid:80) di =3 N i , and note that (cid:107) N (cid:107) = √ X + Y . Then letting f X denote the density of X we have f X ( x ) = 12 π √ λ λ (cid:90) x e − z λ − x − z λ ( z ( x − z )) − / dz ≤ π √ λ λ B (1 / , / e − x λ . (C.1)For y ≥ √ X + y is zero on ( −∞ , √ y ) and using (C.1) it can be boundedon [ √ y, ∞ ) by f √ X + y ( x ) = 2 xf X ( x − y ) ≤ √ y + λ √ λ λ , so that for all r ≥ g ( y ) = P (cid:16) r ≤ (cid:112) X + y ≤ r + (cid:15) (cid:17) ≤ √ y + λ √ λ λ (cid:15). Hence, for d ≥
3, noticing that X is independent of Y , we find that P (cid:16) r ≤ √ X + Y ≤ r + (cid:15) (cid:17) = E [ g ( Y )] ≤ (cid:15) √ λ λ E [ Y + λ ] / ≤ (cid:15) √ λ (cid:32) d (cid:88) i =3 λ i λ + 1 (cid:33) / , which proves the result. (cid:4) Let φ ( w ) := (cid:107) w (cid:107) . This function is trice continuously differentiable on R v \ { } and thefollowing bounds on the derivatives of φ hold: | φ (cid:48) ( w )( x ) | ≤ (cid:107) x (cid:107)| φ (cid:48)(cid:48) ( w )( x, y ) | ≤ (cid:107) w (cid:107) − (cid:107) x (cid:107) (cid:107) y (cid:107)| φ (cid:48)(cid:48)(cid:48) ( w )( x, y, z ) | ≤ (cid:107) w (cid:107) − (cid:107) x (cid:107) (cid:107) y (cid:107) (cid:107) z (cid:107) . (C.2)For a real symmetric matrix B we denote the j -th order statistic of its eigenvalues by λ ( j ) ( B ). Finally, we say that a random vector X is conditionally normal given F ⊂ H with zero mean and the conditional covariance matrix V , denoted by X | F ∼ N (0 , V ), if V is F -measurable, a.s. finite and positive semi-definite and the conditional characteristicfunction of X is given by E [ e it (cid:62) X | F ] = exp (cid:18) − t (cid:62) V t (cid:19) a.s.
Theorem C.1.
Let X , . . . , X n be random vectors in R v that are conditionally independentgiven F ⊂ H with E [ X i | F ] = 0 and E [ (cid:107) X i (cid:107) | F ] < ∞ a.s. Let T := (cid:80) ni =1 X i and let N be a random vector in R v such that N | F ∼ N (0 , V ) , where V = E [ T T (cid:62) | F ] a.s. Then,assuming that υ ≡ λ ( d ∨ − ( V ) > a.s., d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F ) ≤ C d (cid:32) υ − / n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] (cid:33) / a.s. , where C d > is a constant depending only on d .Proof. Let f be a trice continuously differential function, such that f ( x ) = 1 if x ≤ f = 0 if x ≥ (cid:15) >
0, and (cid:12)(cid:12) f ( j ) ( x ) (cid:12)(cid:12) ≤ D(cid:15) − j (0 ,(cid:15) ) ( x ) for some constant D > ≤ j ≤ g r ( s ) := f ( (cid:107) s (cid:107) − r ) . First, P ( (cid:107) T (cid:107) ≤ r | F ) ≤ E [ g r ( T ) | F ] ≤ P ( (cid:107) N (cid:107) ≤ r + (cid:15) | F ) + E [ g r ( T ) − g r ( N ) | F ]and P ( (cid:107) T (cid:107) > r | F ) ≤ − E [ g r − (cid:15) ( T ) | F ] ≤ P ( (cid:107) N (cid:107) > r − (cid:15) | F ) + E [ g r − (cid:15) ( N ) − g r − (cid:15) ( T ) | F ] a.s. for all r ≥ (cid:15) >
0. Therefore, w.p.1, d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F )= sup q ∈ Q ≥ | P ( (cid:107) T (cid:107) ≤ q | F ) − P ( (cid:107) N (cid:107) ≤ q | F ) |≤ sup q ∈ Q > | E [ g q ( T ) − g q ( N ) | F ] | + sup q ∈ Q ≥ P ( N ∈ S q,(cid:15) | F ) , (C.3)Consider the first term on the third line of (C.3). Claim C.1.
There exists a constant
B > such that for any q > , | E [ g q ( T ) − g q ( N ) | F ] | ≤ B(cid:15) n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] a.s.Proof. Let Z , . . . , Z n be i.i.d. standard normal random vectors in R v independent of X , . . . , X n and F and let Y i := V / i Z i , where V i is a version of E [ X i X (cid:62) i | F ]. Define U i := i − (cid:88) k =1 X k + n (cid:88) k = i +1 Y k and W i := g q ( U i + X i ) − g q ( U i + Y i ) . Then g q ( T ) − g q ( N ) = (cid:80) ni =1 W i and | E [ g q ( T ) − g q ( N ) | F ] | ≤ n (cid:88) i =1 | E [ W i | F ] | a.s.Let G i := F (cid:87) σ ( X , . . . , X i − , Z i +1 , . . . , Z n ) and let h i ( λ ) := g q ( U i + λX i ) and h i ( λ ) := g q ( U i + λY i ). Using Taylor expansion up to the third order, we find that W i = (cid:88) j =0 j ! (cid:16) h ( j ) i (0) − h ( j ) i (0) (cid:17) + 13! (cid:16) h (3) i ( λ ) − h (3) i ( λ ) (cid:17) , where | λ | , | λ | <
1. The tower property of conditional expectations and the fact that X i and Y i are conditionally independent of G i given F imply that E [ h ( j ) i (0) − h ( j ) i (0) | F ] = 0 a.s.for j = 1 ,
2. Finally, using the bounds in (C.2) and noticing that | f ( j ) ( x − q ) | ≤ D(cid:15) − x − j × ( q,q + (cid:15) ) ( x ) for 1 ≤ j ≤
3, we get | E [ h (3) i − h (3) i | F ] | ≤ B(cid:15) (cid:0) E [ (cid:107) X i (cid:107) | F ] + E [ (cid:107) Y i (cid:107) | F ] (cid:1) a.s.for some constant B >
0. The result then follows from Lemma 4 in Rhee and Talagrand(1986), i.e., there is a constant
M > E [ (cid:107) Y i (cid:107) | F ] ≤ M E [ (cid:107) X i (cid:107) | F ] a.s. (cid:3) Using Lemma C.7 when d ≥ d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | G ) ≤ B(cid:15) n (cid:88) i =1 E [ (cid:107) X i (cid:107) | F ] + C (cid:48) d √ υ (cid:15) a.s.For d = 1 we have P ( N ∈ S q,(cid:15) | F ) ≤ (cid:15)/ √ πυ and the same bound holds. Finally, since(C.4) holds for any (cid:15) > (cid:15) a.s. on { (cid:15) ∈ (0 , ∞ ) } . Then the resultfollows by taking (cid:15) = (cid:0) √ υ (cid:80) ni =1 E [ (cid:107) X i (cid:107) | F ] (cid:1) / . (cid:4) Corollary C.1.
Let X , . . . , X n be conditionally i.i.d. given F ⊂ H with E [ X | F ] = 0 and E [ (cid:107) X (cid:107) | F ] < ∞ a.s. Let T := n − / (cid:80) ni =1 X i and let N | F ∼ N (0 , V ) , where V = E [ X X (cid:62) | F ] a.s. Then, assuming that υ ≡ λ ( d ∨ − ( V ) > a.s. d K ( (cid:107) T (cid:107) , (cid:107) N (cid:107) | F ) ≤ C d (cid:32) E [ (cid:107) X (cid:107) | F ] υ / √ n (cid:33) / a.s. , where C d > is a constant depending only on d . Lemma C.8.
Let X , . . . , X n be random variables that are conditionally i.i.d. given F ⊂ H with E [ X | F ] = 0 and E [ | X | | F ] < ∞ a.s. Let T := n − / (cid:80) ni =1 X i and N | F ∼N (0 , σ ) , where σ = Var( X | F ) a.s. Then, assuming that σ > a.s., d K ( T, N | F ) ≤ C E [ | X | | F ] σ √ n a.s. , where C > is a constant.Proof. The proof is similar to the proof of Theorem 11.4.1 in Athreya and Lahiri (2006)( for the unconditional case ) and so is omitted. (cid:4)
Lemma C.9.
Suppose that G and F are σ -fields such that F ⊂ G ⊂ H , X and Y arerandom vectors in R d such that X | G ∼ N (0 , Σ X ) and Y | F ∼ N (0 , Σ Y ) . Then, assumingthat υ ≡ λ ( d ∨ − (Σ Y ) > a.s., d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G , F ) ≤ C d (cid:16) υ − (cid:107) Λ X − Λ Y (cid:107) e, ∞ (cid:17) / a.s. , (C.5) where C d is a constant depending only on d and Λ ( · ) is the matrix of eigenvalues corre-sponding to Σ ( · ) .Proof. Let f be a twice continuously differential function such that f ( x ) = 1 if x ≤ f ( x ) = 0 if x ≥ (cid:15) > (cid:12)(cid:12) f ( j ) (cid:12)(cid:12) ≤ D(cid:15) − j (0 ,(cid:15) ) ( x ) for some constant D > ≤ j ≤ g r ( s ) := f ( (cid:107) s (cid:107) − r ) . As in the proof of Theorem C.1 for any (cid:15) > d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G , F ) ≤ sup q ∈ Q > | E [ g q ( X ) | G ] − E [ g q ( Y ) | F ] | + sup q ∈ Q ≥ P ( Y ∈ S q,(cid:15) | F ) . Let Z and Z be independent standard normal random vectors in R d that are independentof G and F , respectively. Then E [ g q ( X ) | G ] − E [ g q ( Y ) | F ] = E [ g q (Λ / X Z ) | G ] − E [ g q (Λ / Y Z ) | F ]= h q, (Λ X ) − h q, (Λ Y ) a.s. , where h q, ( λ ) := E g q ( λ / Z ) and h q, ( λ ) := E g q ( λ / Z ) (see, e.g., Durrett, 2010, Lemma6.2.1). Claim C.2.
There exists a constant B d depending only on d such that for any q > , | h q, ( λ X ) − h q, ( λ Y ) | ≤ B d (cid:15) (cid:107) λ X − λ Y (cid:107) e, ∞ . Proof.
For t ∈ [0 ,
1] let Z ( t ) := √ tλ / X Z + √ − tλ / Y Z and φ ( t ) := E g q ( Z ( t )). Then h q, ( λ X ) − h q, ( λ Y ) = φ (1) − φ (0) = (cid:90) φ (cid:48) ( t ) dt. Using the integration by parts formula (see Equation A.17 in Talagrand, 2011, Section A.6)for t ∈ (0 , φ (cid:48) ( t ) = 12 E (cid:20)(cid:16) λ / X Z / √ t − λ / Y Z / √ − t (cid:17) (cid:62) ∇ g q ( Z ( t )) (cid:21) = 12 E (cid:2) i (cid:62) ( λ X − λ Y ) ◦ ∇ g q ( Z ( t )) i (cid:3) , where i is the vector of ones. Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) φ (cid:48) ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) λ X − λ Y (cid:107) e, ∞ (cid:90) E (cid:12)(cid:12) i (cid:62) ∇ g q ( Z ( t )) i (cid:12)(cid:12) dt. Since | f ( j ) ( x − q ) | ≤ D(cid:15) − x − j × ( q,q + (cid:15) ) ( x ) for 1 ≤ j ≤
2, the ( k, l )-th element of theHessian of g q is bounded by D (cid:48) (cid:15) − for some constant D (cid:48) >
0. Therefore, | h q, ( λ X ) − h q, ( λ Y ) | ≤ D (cid:48) d (cid:15) (cid:107) λ X − λ Y (cid:107) e, ∞ . (cid:3) Using Lemma C.7 when d ≥ d K ( (cid:107) X (cid:107) , (cid:107) Y (cid:107) | G ) ≤ B d (cid:15) (cid:107) Λ X − Λ Y (cid:107) e, ∞ + C (cid:48) d √ υ (cid:15) a.s. For d = 1, P ( N ∈ S q,(cid:15) | F ) ≤ (cid:15)/ √ πυ a.s., so that the same bound holds. Finally, since(C.6) holds for any (cid:15) > (cid:15) a.s. on { (cid:15) ∈ (0 , ∞ ) } . Consequently, theresult follows by taking (cid:15) = ( √ υ (cid:107) Λ X − Λ Y (cid:107) e, ∞ ) / and noticing that (C.5) holds triviallyon {(cid:107) Λ X − Λ Y (cid:107) e, ∞ = 0 } . (cid:4) Lemma C.10.
Suppose that G and F are σ -fields such that F ⊂ G ⊂ H and let X | G ∼N (0 , σ X ) and Y | F ∼ N (0 , σ Y ) . Then, assuming that σ Y > a.s., d K ( X, Y | G , F ) ≤ C (cid:12)(cid:12) σ X /σ Y − (cid:12)(cid:12) / a.s. , where C > is a constant.Proof. The proof is similar to one for Lemma C.9, and so is omitted. (cid:4)
Lemma C.11.
Let ( G, ( Y, X )) be a network dependent process in R × R d and let F be asub- σ -field of H such that: (a) Y is conditionally independent of X given F ; (b) Y i and Y j are conditionally independent given F if j / ∈ B i := N ( i ; s ) for some s > ; (c) D ( G ) is F -measurable.Let G := σ ( F ∪ σ ( X )) , T := (cid:80) i ∈ N Y i X i , and Z | G ∼ N (0 , V ) , where V = E [ T T (cid:62) | G ] a.s.Then, assuming that υ ≡ λ ( d ∨ − ( V ) > a.s., d K ( (cid:107) T (cid:107) , (cid:107) Z (cid:107) | G ) ≤ C d (cid:0) υ − / β (cid:1) / a.s. , where C d > is a constant depending only on d and β := (cid:88) i ∈ N (cid:88) j ∈ B i (cid:88) k ∈ B i ∪ B j (cid:89) l ∈{ i,j,k } (cid:107) Y l (cid:107) F , (cid:107) X l (cid:107) ∞ . In addition, when d = 1 , d K ( T, Z | G ) ≤ C (cid:0) υ − / β (cid:1) / a.s.Proof. We use the notation from the proof of Theorem C.1. First, for any (cid:15) > d K ( (cid:107) T (cid:107) , (cid:107) Z (cid:107) | G ) ≤ sup q ∈ Q > | E [ g q ( T ) − g q ( Z ) | G ] | + sup q ≥ P ( Z ∈ S q,(cid:15) | G ) . Let Y (cid:48) | F ∼ N (0 , Σ) conditionally independent of (
Y, X ) given F , where Σ = Var( Y | F )a.s., and let Z (cid:48) := (cid:80) i ∈ N Y (cid:48) i X i . Note that E [ g q ( Z ) | G ] = E [ g q ( Z (cid:48) ) | G ] a.s. Also let Q Y and Q Y (cid:48) be the regular conditional distributions of Y and Y (cid:48) given F and Q := Q Y ⊗ Q Y (cid:48) . Since X is G -measurable, for almost all ω ∈ Ω, E [ g q ( T ) − g q ( Z ) | G ]( ω ) = h q ( ω ) , where h q ( ω ) := (cid:90) R n × g q (cid:32)(cid:88) i ∈ N y i X i ( ω ) (cid:33) − g q (cid:32)(cid:88) i ∈ N y (cid:48) i X i ( ω ) (cid:33) Q ( ω, d ( y, y (cid:48) ))(see, e.g., Kallenberg, 2002, Theorem 5.4). Claim C.3.
There exists a constant B d > depending only on d such that for any q > , | h q ( ω ) | ≤ B d (cid:15) (cid:88) i ∈ N (cid:88) j ∈ B i (cid:88) k ∈ B i ∪ B j (cid:89) l ∈{ i,j,k } ( χ l ( ω )) / (cid:107) X l ( ω ) (cid:107) ∞ , where χ i ( ω ) := (cid:82) R n y i Q Y ( ω, d ( y )) .Proof. For y ≡ { y i } i ∈ N let φ ( y ) := g q ( (cid:80) i ∈ N y i X i ( ω )). Then the result follows from Theorem3.4 in R¨ollin (2013) by observing that (cid:107) φ ijk (cid:107) ∞ ≤ B (cid:48) d (cid:15) (cid:89) l ∈{ i,j,k } (cid:107) X l ( ω ) (cid:107) ∞ for some constant B (cid:48) d > d , where φ ijk is the third order partialderivative of φ w.r.t. the coordinates i , j , and k . (cid:3) As in the proof of Theorem C.1 there exists a constant C (cid:48) d > d suchthat sup q ≥ P ( Z ∈ S q,(cid:15) | F ) ≤ C (cid:48) d √ υ (cid:15) a.s.Therefore, noticing that χ i = E [ | Y i | | F ] a.s., the result follows by taking (cid:15) = ( √ υβ ) / .The second assertion for d = 1 follows similarly.= 1 follows similarly.