Bootstrapping High Dimensional Time Series
aa r X i v : . [ m a t h . S T ] A ug BOOTSTRAPPING HIGH DIMENSIONAL TIME SERIES
By Xianyang Zhang ∗ and Guang Cheng † University of Missouri-Columbia and Purdue University
This article studies bootstrap inference for high dimensional weaklydependent time series in a general framework of approximately linearstatistics. The following high dimensional applications are covered:(i) uniform confidence band for mean vector; (ii) specification test-ing on the second order property of time series such as white noisetesting and bandedness testing of covariance matrix; (iii) specifica-tion testing on the spectral property of time series. In theory, we firstderive a Gaussian approximation result for the maximum of a sumof weakly dependent vectors, where the dimension of the vectors isallowed to be exponentially larger than the sample size. In particular,we illustrate an interesting interplay between dependence and dimen-sionality, and also discuss one type of “dimension free” dependencestructure. We further propose a blockwise multiplier (wild) bootstrapthat works for time series with unknown autocovariance structure.These distributional approximation errors, which are finite samplevalid, decrease polynomially in sample size. A non-overlapping blockbootstrap is also studied as a more flexible alternative. The aboveresults are established under the general physical/functional depen-dence framework proposed in Wu (2005). Our work can be viewed asa substantive extension of Chernozhukov et al. (2013) to time seriesbased on a variant of Stein’s method developed therein.
1. Introduction.
High-dimensional data are increasingly encountered in many applications ofstatistics such as bioinformatics, information technology, medical imaging, astronomy and finan-cial studies. In recent years, there is a growing body of literature concerning inference on the firstand second order properties of high dimensional data; see [7–9, 13, 14, 26, 33] among others. Thevalidity of these procedures is generally established under independence amongst the data vectors,which can be quite restrictive for situations that involve temporally observed data. Examples in-clude spatial-temporal modeling [39] and financial study of a large number of asset returns [37]. ∗ Assistant Professor, Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211. E-mail:[email protected]. Tel: +1 (573) 882-4455. Fax: +1 (573) 884-5524. † Corresponding Author. Associate Professor, Department of Statistics, Purdue University, West Lafayette, IN47906. E-mail: [email protected]. Tel: +1 (765) 496-9549. Fax: +1 (765) 494-0558. Research Sponsored by NSFCAREER Award DMS-1151692, DMS-1418042, Simons Foundation 305266.
AMS 2000 subject classifications:
Primary 62M10 Secondary 62E17, 62F40
Keywords and phrases:
Blockwise Bootstrap, Gaussian Approximation, High Dimensionality, Physical DependenceMeasure, Slepian interpolation, Stein’s Method, Time Series. X. ZHANG AND G. CHENG
Although high dimensional statistics has witnessed unprecedented development, statistical infer-ence for high dimensional time series remains largely untouched so far. In the conventional lowdimensional setting, inference for time series data typically involves the direct estimation of theasymptotic covariance matrix, which is known to be difficult in the presence of heteroscedasticityand autocorrelation of unknown forms [1]. In the high dimensional setting, where the dimension iscomparable or even larger than sample size, the classical inferential procedures designed for the lowdimensional case are no longer applicable, e.g., the asymptotic covariance matrix is singular. Alonga different line, alternative nonparametric procedures including block bootstrap, subsampling andblockwise empirical likelihood [10, 23, 24, 28, 32] have been proposed to avoid the direct estimationof covariance matrices. However, the extension of these procedures (coupled with suitable testingprocedures) to the high dimensional setting remains unclear. One relevant high dimensional work([12]) we are aware is on the estimation rates of the covariance/precision matrices of time series.In this paper, we establish a general framework of conducting bootstrap inference for high di-mensional stationary time series under weak dependence. We start from three motivating examplesthat are mainly concerned with first or second order property of time series: (i) uniform confidenceband for mean vector; (ii) testing for serial correlation; (iii) testing on the bandedness of covariancematrix. The proposed bootstrap procedures are rather simple to implement and supported by sim-ulation results. We want to emphasize that neither Gaussian assumption nor strong restrictions onthe covariance structure are imposed in these applications. An important by-product of Examples(ii) and (iii) is the covariance structure testing for high dimensional time series that even does notrely on the existence of the null limit distribution. This new result is in sharp contrast with theexisting literature for i.i.d data such as [8, 13, 14]. We also remark that the maximum-type testingprocedure considered in these examples is expected to be particularly powerful for detecting sparsealternatives (see [8]). A comprehensive investigation along this line is left as our future topic.The underlying theory in supporting these high dimensional applications is a general Gaussianapproximation theory and its bootstrap version. The Gaussian approximation theory quantifies theKolmogorov distance between the largest element of a sum of weakly dependent vectors and itsGaussian analog that shares the same autocovariance structure. We develop our theory in the gen-eral framework of dependency graph, which leads to delicate bounds on the Kolmogorov distancefor various types of time series. The approximation error, which is finite sample valid, decreasespolynomially in sample size even when the data dimension is exponentially high. Moreover, westudy two important dependence structures in more details: M -dependent time series and weaklydependent time series. Although the sharpness of Kolmogorov distance is not established in thispaper, our theoretical results (also see Figure 1) strongly indicate an interesting interplay betweendependence and dimensionality: the less dependent of the data vectors, the faster diverging rateof the dimension is allowed for obtaining an accurate Gaussian approximation. We also proposean interesting “dimension free” dependence structure that allows the dimension to diverge at the OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES rate as if the data were independent. However, in practice, the intrinsic dependence structure oftime series is usually unknown. This motivates us to develop a bootstrap version of the Gaussianapproximation theory that does not require such knowledge. Specifically, we propose a blockwisemultiplier bootstrap that is able to capture the dependence amongst and within the data vectors.Moreover, it inherits the high quality approximation without relying on the autocovariance infor-mation. We also introduce a non-overlapping block bootstrap as a more flexible alternative. Theabove theoretical results are major building blocks of a general framework of conducting bootstrapinference for high dimensional time series. This general framework assumes that the quantity ofinterest admits an approximately linear expansion, and thus covers the three examples mentionedabove. This quantity of interest can be expressed as a functional of the distribution of the timeseries with finite or infinite length. Hence, our result is also useful in making inference for thespectrum of time series.Our general Gaussian approximation theory and its block bootstrap version substantially re-lax the independence assumption in [2, 16], and is established using several techniques includingthe Slepian interpolation [35], leave-one-block-out argument (modification of Stein’s leave-one-outargument [36]), self-normalization [17], weak dependence measure [40], and M -dependent approxi-mation [29]. It is worth pointing out that our results are established under the physical/functionaldependence measure proposed in [40]. This framework (or its variants) is known to be very generaland easy to verify for linear and nonlinear data-generating mechanisms, and it also provides aconvenient way for establishing large-sample theories for stationary causal processes [12, 40, 41].In particular, our work is largely inspired by a recent breakthrough in Gaussian approximationfor i.i.d data ([16]) that obtained an astounding improvement over the previous results in [3] byallowing the dimension of the data vectors to be exponentially larger than the sample size.The rest is organized as follows. In Section 2, we describe three concrete bootstrap inferenceprocedures mentioned above in details. Section 3 gives the Gaussian approximation result thatworks even when the dimension is exponentially larger than sample size, and Section 4 proposesthe blockwise multiplier (wild) bootstrap and also the non-overlapping block bootstrap that do notdepend on the autocovariance structure of time series. Building on the results in Sections 3 and 4,a general framework of conducting bootstrap inference based on approximately linear statisticsis established in Section 5. Three examples considered in 2 and one spectral testing example arecovered by this framework. All the proofs are gathered in the supplementary material.
2. High Dimensional Inference.
To motivate our general theory, we consider three concretebootstrap inference procedures for high dimensional time series: uniform confidence band; whitenoise testing; and bandedness testing for covariance matrix. These procedures are rather straight-forward to implement. The main focus of this section is mostly on the methodological side, andthe general theoretical results are deferred to Section 5. An ad-hoc way of choosing block size in
X. ZHANG AND G. CHENG bootstrap is discussed in Section 2.1.2.1.
Uniform confidence band.
Consider n observations from a sequence of weakly dependent p -dimensional time series { x i } with x i = ( x i , . . . , x ip ) ′ . We are interested in constructing a 100(1 − α )th uniform confidence band for the mean vector µ = ( µ , µ , . . . , µ p ) ′ in the form of(1) (cid:26) µ = ( µ , . . . , µ p ) ′ ∈ R p : √ n max ≤ j ≤ p | µ j − ¯ x nj | ≤ c ( α ) (cid:27) , where ¯ x n = (¯ x n , . . . , ¯ x np ) ′ = P ni =1 x i /n . In the traditional low dimensional regime, confidenceregion for the mean of a multivariate time series is typically constructed by inverting a suitabletest. A common choice is the Wald type test which is of the form n (¯ x n − µ ) ′ b Σ − (¯ x n − µ ), where µ = ( µ , . . . , µ p ) ′ and b Σ is a consistent estimator of the so-called long run variance matrix. However,obtaining a consistent b Σ could be difficult in practice due to the unknown dependence structure.To avoid this hassle, several appealing nonparametric alternatives, e.g., moving block bootstrapmethod [10, 24, 28], subsampling approach [32] and block-wise empirical likelihood [23], have beenproposed. In the high dimensional regime, where the dimension of the time series is comparablewith or even much larger than the sample size, inverting the Wald type test is no longer applicablebecause the long run variance estimator b Σ is singular for p > n . Moreover, the direct applicationof the nonparametric approaches described above to the high dimensional setting is unclear yet.In this subsection, we propose a bootstrap-assisted method to obtain the critical value c ( α ) in(1), whose theoretical validity will be justified in Section 5.1. Specifically, we introduce the followingblockwise multiplier (wild) bootstrap. For simplicity, suppose n = b n l n with b n , l n ∈ Z . Define thenon-overlapping block sums, b A ij = ib n X l =( i − b n +1 ( x lj − ¯ x nj ) , i = 1 , , . . . , l n , and the bootstrap statistic, T b A = max ≤ j ≤ p √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l n X i =1 b A ij e i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where { e i } is a sequence of i.i.d. N (0 ,
1) random variables independent of { x i } . The bootstrapcritical value is defined as c ( α ) := inf { t ∈ R : P ( T b A ≤ t |{ x i } ni =1 ) ≥ − α } . We next conduct a small simulation study to assess the finite sample coverage probability of theuniform confidence band. Consider a p -dimensional VAR(1) (vector autoregressive) process,(2) x t = ρx t − + p − ρ ǫ t , where ǫ t = ( ǫ t , . . . , ǫ tp ) ′ . For the error process { ǫ t } , we consider three cases: (i) ǫ tj = ( ε tj + ε t ) / √ , where ( ε t , ε t , . . . , ε tp ) ′ i.i.d. ∼ N (0 , I p +1 ); (ii) ǫ tj = ρ ζ tj + ρ ζ t ( j +1) + · · · + ρ p ζ t ( j + p − , where { ρ j } pj =1 OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES are generated independently from Unif(2 ,
3) (uniform distribution on [2,3]), and { ζ tj } are i.i.d N (0 ,
1) random variables; (iii) ǫ tj is generated from the moving average model in (ii) with { ζ tj } being i.i.d centralized Gamma(4 ,
1) random variables. Set n = 120, p = 500 , ρ = 0 . . b n = 4 , , , , , , . Table 1 reports the coverage probabilities at 90% and 95% nominal levels based on 5000 simu-lations and 499 bootstrap resamples. We note that the coverage probabilities appear to be low forrelatively small block size. When ρ increases, a larger block size is generally required to capture thedependence. Although the coverage probability is generally sensitive to the choice of the block size,with a proper block size, the coverage probability can be reasonably close to the nominal level. Forunivariate time series, there are two major approaches for selecting the optimal block size: the non-parametric plug-in method (e.g. [6]) and the empirical criteria-based method [19]. However, theseselection procedures are deduced based on the bias-variance tradeoff, which are not intended toguarantee the best coverage of confidence interval. Moreover, it is still unclear how these selectionrules can be extended to the high dimensional context.Hence, we provide an ad-hoc way for choosing the block size below. Given a set of realizations { x t } nt =1 , we pick an initial block size b int such that n = b int l int where b int , l int ∈ Z . Conditional onthe sample { x t } nt =1 , we let s , . . . , s l int be i.i.d uniform random variables on { , . . . , l int − } anddefine x ∗ ( j − b int + i = x s j b int + i with 1 ≤ j ≤ l int and 1 ≤ i ≤ b int . In other words, { x ∗ t } nt =1 is anon-overlapping block bootstrap sample with block size b int . For each b n (block size for the originalsample), we can compute the times that the sample mean ¯ x n is contained in the uniform confidenceband constructed based on the bootstrap sample { x ∗ t } nt =1 and then compute the empirical coverageprobabilities based on B bootstrap samples. This is based on the notion that ¯ x n is the true meanfor the bootstrap sample conditional on { x t } nt =1 . In this case, the block size, which delivers the mostaccurate coverage for ¯ x n , can be viewed as an estimate of the optimal b n for the original series. Weemploy the above procedure with b int = 6 and B = 500 to choose the optimal block size. Basedon 200 realizations from the original data generating process, the coverage probabilities (given theselected block size) in different simulation setup are summarized in Table 2. We observe that thecoverage probability based on the optimal block size is close to the best coverage presented in Table1. Finally we point out that it might be possible to iterate the above procedure to further improvethe empirical performance.2.2. Testing for serial correlation.
Covariance matrix plays a crucial role in many areas of sta-tistical inference. For independent vectors, many methods have been developed for testing specificstructures of covariance matrices (see e.g. [9, 14, 26, 33] for some recent developments). In thissubsection, we examine the serial correlation of a sequence of time series data by testing its auto-covariance matrix (a more general measure than covariance matrix).To illustrate the idea, let γ ( l ) = ( γ jk ( l )) pj,k =1 = E x i x ′ i + l ∈ R p × p be the autocovariance matrix of X. ZHANG AND G. CHENG
Table 1
Coverage probabilities of the uniform confidence band for the mean, where the block size b n = 4 , , , , , , , and n = 120 . p = 500,(i) p = 500,(ii) p = 500,(iii) p = 1000,(i) p = 1000,(ii) p = 1000,(iii)90% 95% 90% 95% 90% 95% 90% 95% 90% 95% 90% 95% ρ = 0 . b n = 4 85.0 92.2 85.6 92.6 85.5 91.7 86.0 92.8 84.8 91.9 84.7 91.4 b n = 6 87.8 93.8 85.8 92.7 86.0 92.4 87.7 94.5 86.0 92.6 85.8 92.7 b n = 8 89.1 95.5 85.7 92.3 86.4 93.1 89.2 95.1 85.8 92.2 85.6 92.3 b n = 10 89.5 95.7 85.7 92.3 85.2 92.1 90.7 96.0 85.9 92.5 86.1 92.5 b n = 12 89.2 95.3 85.4 91.8 85.4 92.5 90.4 96.5 84.7 91.9 86.4 92.9 b n = 15 90.3 96.0 84.6 91.8 85.2 92.3 90.2 96.4 85.0 92.3 85.3 92.4 b n = 20 90.2 96.5 83.0 90.7 83.2 90.8 91.2 96.9 84.1 91.3 84.2 91.9 ρ = 0 . b n = 4 62.9 76.9 73.6 83.5 73.3 83.3 64.3 78.1 73.0 82.7 73.2 82.8 b n = 6 76.5 87.1 79.1 87.3 78.9 87.4 76.4 86.6 78.6 87.4 78.1 87.1 b n = 8 81.5 91.6 80.8 88.8 80.7 89.4 81.9 91.0 80.8 88.9 80.9 88.9 b n = 10 84.2 92.5 81.5 89.8 81.5 89.3 84.9 93.5 82.2 90.1 82.5 89.9 b n = 12 84.6 93.0 82.2 90.0 82.3 90.5 86.2 94.4 81.6 89.9 83.3 90.9 b n = 15 87.0 94.3 82.0 90.1 82.5 90.7 87.1 94.6 82.2 90.1 82.5 89.9 b n = 20 88.0 95.5 81.0 89.3 81.9 89.8 88.9 96.0 81.6 89.9 83.3 90.9 Table 2
Coverage probabilities of the uniform confidence band for the mean, where the block size is chosen automatically, p = 500 , n = 120 , and the nominal level is 95%. ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . a p -dimensional stationary time series { x i } with E x i = 0. Consider the null hypothesis H : γ ( l ) = e γ ( l ) := ( e γ jk ( l )) pj,k =1 , for any l ∈ Λ ⊂ { , , , . . . } versus the alternative that H a : γ ( l ) = e γ ( l ) for some l ∈ Λ . Thecardinality of Λ is allowed to grow with the dimension p . Let b γ jk ( l ) = P n − li =1 x ij x ( i + l ) k /n for 1 ≤ j, k ≤ p be the sample autocovariance at lag l . Our test rejects the null hypothesis H if(3) √ n max l ∈ Λ max ≤ j,k ≤ p | b γ jk ( l ) − e γ jk ( l ) | > c ( α ) , where c ( α ) denotes the bootstrap critical value at level α . This framework includes several importantapplications such as white noise testing (i.e., testing for serial correlation) and covariance testing.In the white noise testing, we consider H : γ ( l ) = p × p for any 1 ≤ l ≤ L v.s. H a : γ ( l ) = p × p for some 1 ≤ l ≤ L , where p × p denotes a p × p matrix of all zeros. This is a standarddiagnostic procedure in time series analysis, e.g., [4, 18, 22, 34] among others. However, in the highdimensional setting, i.e., p ≫ n , there seems no systematic method available to test the white noiseassumption. The proposed test statistic √ n max ≤ l ≤ L max ≤ j,k ≤ p | b γ jk ( l ) | fills in this gap. Again, weemploy the blockwise multiplier bootstrap to obtain the critical value c ( α ). To proceed, we let OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES ν i = ( ν i, , . . . , ν i,p L ) = (vec( x i x ′ i +1 ) ′ , . . . , vec( x i x ′ i + L ) ′ ) ′ ∈ R p L for i = 1 , . . . , N := n − L , wherevec denotes the operator that stacks the columns of a p × p matrix as a vector with p components.Suppose N = b n l n for b n , l n ∈ Z . Define e T b A = max ≤ j ≤ p L √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l n X i =1 b A ij e i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , b A ij = ib n X l =( i − b n +1 ( ν l,j − ¯ ν nj ) , where { e i } is a sequence of i.i.d standard normal independent of { x i } , and ¯ ν nj = P Ni =1 ν i,j /N. Thebootstrap critical value is then given by c ( α ) := inf { t ∈ R : P ( e T b A ≤ t |{ x i } ni =1 ) ≥ − α } . The aboveprocedure can be easily modified to get the critical value for the general test described in (3).When assuming Λ = { } , we obtain an important by-product: covariance structure testing forhigh dimensional vector. In this case, our test reduces to √ n max ≤ j ≤ k ≤ p | b γ jk (0) − e γ jk (0) | > c ( α ) . Compared to the existing work in the independence case, e.g., [8], our test enjoys three appealingfeatures: (i) it allows dependence amongst data vectors and relaxes the Gaussian assumption; (ii)it does not require the existence of a null limit distribution such as the extreme distribution ofType I in [9]. Hence, we can avoid the slow convergence issue of the extreme value distribution(see [30]), which causes an inaccurate critical value. Rather, a blockwise multiplier bootstrap isemployed to provide high quality approximation; (iii) it does not impose strong restrictions on thecovariance structure such as sparsity on the precision matrix [8] or pseudo-independence among itscomponents [13, 14].To evaluate the finite sample performance of the white noise testing procedure, we considerthe following data generating processes: (i) independent normal random vectors whose covariancestructure is determined by a moving average model x ij = ρ ζ ij + ρ ζ i ( j +1) + · · · + ρ p ζ i ( j + p − ,where { ρ j } pj =1 are generated independently from Unif(2 , { ζ ij } are i.i.d N (0 ,
1) randomvariables; (ii) multivariate ARCH model defined as x i = Σ / i ǫ i with ǫ i ∼ N (0 , I p ) and Σ i =0 . I p + 0 . x i − x ′ i − , where Σ / i is a lower triangular matrix based on the Cholesky decompositionof Σ i ; (iii) VAR(1) model x i = ρx i − + p − ρ ǫ i , where ρ = 0 . { ǫ i } are generatedaccording to (i). We consider n = 60 and p = 30 or 50 . Notice that the actual number of parametersin consideration is p × L , where L is the number of lags specified in the hypothesis. Table 3summarizes the rejection probabilities at 10% and 5% nominal levels based on 5000 simulationsand 499 bootstrap resamples. In general, the proposed method delivers reasonable size and power,although we still observe some downward size distortion and power loss especially for L = 3. Thepower loss here is presumably due to the correlation structure of the VAR(1) model. It is also worthnoting that the choice of b n = 1 generally performs well for the martingale difference sequencesconsidered under the null. Remark . The simulation results demonstrate the usefulness of the proposed method butthey also leave some room for improvement. Here we point out two possibilities: (i) it is of interest
X. ZHANG AND G. CHENG
Table 3
Rejection percentages for testing the uncorrelatedness, where the block size b n = 1 , , , , , , and n = 60 . Cases (i)and (ii) are under null, while case (iii) is under alternative. p = 30,(i) p = 30,(ii) p = 30,(iii) p = 50,(i) p = 50,(ii) p = 50,(iii)10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% L = 1 b n = 1 8.1 3.5 9.2 3.2 73.1 59.1 8.3 3.9 8.9 2.8 72.4 58.9 b n = 2 9.2 3.6 7.0 2.4 67.0 49.6 8.7 3.2 6.7 2.2 68.2 49.4 b n = 3 10.8 4.6 6.8 2.5 66.0 46.4 9.9 4.1 6.9 2.6 66.4 46.7 b n = 4 10.9 4.6 6.7 3.0 67.0 46.8 11.0 4.1 6.9 3.0 66.4 46.3 b n = 5 11.4 4.5 7.8 3.7 69.2 47.5 11.6 4.5 7.8 3.7 67.6 46.6 b n = 6 12.7 5.2 9.2 4.7 67.7 47.7 12.3 5.1 8.3 4.4 68.2 48.2 L = 3 b n = 1 7.2 2.4 8.5 3.3 58.3 43.8 6.7 2.5 8.6 3.2 58.7 43.4 b n = 2 7.6 2.7 5.4 2.1 51.3 33.0 7.9 3.0 5.3 2.4 51.3 32.4 b n = 3 6.9 2.3 3.9 1.5 46.4 28.1 6.4 2.0 3.7 1.6 46.9 27.7 b n = 4 7.0 2.3 3.8 2.0 47.0 27.4 6.6 2.0 4.2 2.2 47.5 28.2 b n = 5 7.8 2.4 5.1 2.4 48.6 28.2 7.4 2.2 4.6 2.5 47.7 27.5 b n = 6 7.9 2.5 6.4 3.8 49.3 28.1 8.7 2.7 5.9 3.2 49.1 28.1 to study the studentized version of the test statistic which may be more efficient as expected inthe low dimensional setting (see Remark 5.1); (ii) in the sparsity situation, the test statistic can beconstructed based on a suitable linear transformation of the observations. The linear transformationaims to magnify the signals owing to the dependence within the data vector under alternatives,and hence improves the power of the testing procedure, e.g., [8, 20].2.3. Bandedness testing of covariance matrix.
In this subsection, we consider testing the band-edness of covariance matrix γ (0). This problem aries, for example, in econometrics when testingcertain economic theories; see [1, 27] and reference therein. Also see [9, 33] for independent case.For any integer ι ≥ n or p ), we want to test(4) H : γ jk (0) = 0 , | j − k | ≥ ι. Our setting significantly generalizes the one considered in [9] which focuses on independent Gaussianvectors. Here, we shall allow non-Gaussian and dependent random vectors.We define the test statistic as(5) T band = √ n max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b γ jk (0) pb γ jj (0) b γ kk (0) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max | j − k |≥ ι √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 x ij x ik pb γ jj (0) b γ kk (0) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . For n = b n l n with b n , l n ∈ Z , we define the block sums b A i,jk = ib n X l =( i − b n +1 x ij x ik − b γ jk (0) pb γ jj (0) b γ kk (0) , i = 1 , , . . . , l n , OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES and the bootstrap statistic T band, b A = max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n l n X i =1 b A i,jk e i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where { e i } is a sequence of i.i.d N (0 ,
1) independent of { x i } . We reject the null H if T band, b A >c band ( α ), where c band ( α ) := inf { t ∈ R : P ( T band, b A ≤ t |{ x i } ni =1 ) ≥ − α } . Alternatively, one canemploy the non-overlapping block bootstrap (to be presented in Sections 4.2) to obtain the criticalvalue.
3. Gaussian Approximation Theory.
In this section, we derive a Gaussian approximationtheory that serves as the first step in studying high dimensional inference procedures in Section 2.Consider a sequence of p -dimensional dependent random vectors { x i } ni =1 with x i = ( x i , . . . , x ip ) ′ .Suppose E x i = 0 and Σ i,j := cov( x i , x j ) ∈ R p × p . The Gaussian counterpart is defined as a se-quence of Gaussian random variables { y i } ni =1 independent of { x i } ni =1 . In addition, { y i } ni =1 preservesthe autocovariance structure of { x i } in the sense that E y i = 0 and cov( y i , y j ) = Σ i,j (note thatthis assumption can be weakened, see Remark 3.1). Gaussian approximation theory quantifies theKolmogorov distance defined as(6) ρ n := sup t ∈ R | P ( T X ≤ t ) − P ( T Y ≤ t ) | , where T X = max ≤ j ≤ p X j , T Y = max ≤ j ≤ p Y j , and(7) X = ( X , . . . , X p ) ′ = 1 √ n n X i =1 x i , Y = ( Y , . . . , Y p ) ′ = 1 √ n n X i =1 y i . Chernozhukov et al (2013) recently showed that for independent data vectors, ρ n decays tozero polynomially in the sample size. In Section 3.1, we substantially relax their independenceassumption by first establishing a general proposition, i.e., Proposition 3.1, in the framework ofdependency graph. This general result leads to delicate bounds on the Kolmogorov distance forvarious types of weakly dependent time series even when their dimension is exponentially high, i.e.,Sections 3.2 – 3.3.3.1. General framework: dependency graph.
In this subsection, we introduce a flexible frame-work in modelling the dependence among a sequence of p -dimensional dependent ( unnecessar-ily identical ) random vectors { x i } ni =1 . We call it as dependency graph G n = ( V n , E n ), where V n = { , , . . . , n } is a set of vertices and E n is the corresponding set of undirected edges. For anytwo disjoint subsets of vertices S, T ⊆ V n , if there is no edge from any vertex in S to any vertex in T ,the collections { x i } i ∈ S and { x i } i ∈ T are independent. Let D max ,n = max ≤ i ≤ n P nj =1 I {{ i, j } ∈ E n } be the maximum degree of G n and denote D n = 1 + D max ,n . Throughout the paper, we allow D n to X. ZHANG AND G. CHENG grow with the sample size n. For example, if an array { x i,n } ni =1 is a M := M n dependent sequence(that is x i,n and x j,n are independent if | i − j | > M ), then we have D n = 2 M + 1.Within this general framework, we want to understand the largest possible diverging rate of p (w.r.t. n ) under which the Kolmogorov distance between the distributions of T X and T Y , i.e., ρ n defined in (6), converges to zero. Recall that T X = max ≤ j ≤ p X j , T Y = max ≤ j ≤ p Y j . The problemof comparing distributions of maxima is nontrivial since the maximum function z = ( z , . . . , z p ) ′ → max ≤ j ≤ p z j is non-differentiable. To overcome this difficulty, we consider a smooth approximationof the maximum function, F β ( z ) := β − log p X j =1 exp( βz j ) , z = ( z , . . . , z p ) ′ , where β > ≤ F β ( z ) − max ≤ j ≤ p z j ≤ β − log p. (8)Denote by C k ( R ) the class of k times continuously differentiable functions from R to itself, anddenote by C kb ( R ) the class of functions f ∈ C k ( R ) such that sup z ∈ R | ∂ j f ( z ) /∂z j | < ∞ for j =0 , , . . . , k. Set m = g ◦ F β with g ∈ C b ( R ). In Proposition 3.1 below, we derive a non-asymptoticupper bound for the quantity | E [ m ( X ) − m ( Y )] | by employing the Slepian interpolation [35], andmodifying Stein’s leave-one-out argument [36] to the leave-one-block-out argument for capturingthe local dependence of the data.Denote the truncated variables e x ij = ( x ij ∧ M x ) ∨ ( − M x ) − E [( x ij ∧ M x ) ∨ ( − M x )] and e y ij =( y ij ∧ M y ) ∨ ( − M y ) for some M x , M y >
0. Let e x i = ( e x i , . . . , e x ip ) ′ and e y i = ( e y i , . . . , e y ip ) ′ . For1 ≤ i ≤ n , let N i = { j : { i, j } ∈ E n } be the set of neighbors of i , and e N i = { i } ∪ N i . Let φ ( M x ) bea constant depending on the threshold parameter M x such thatmax ≤ j,k ≤ p n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈ e N i ( E x ij x lk − E e x ij e x lk ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ φ ( M x ) . Analogous quantity φ ( M y ) can be defined for { y i } . Set φ ( M x , M y ) = φ ( M x ) + φ ( M y ). Define m x,k = (¯ E max ≤ j ≤ p | x ij | k ) /k , m y,k = (¯ E max ≤ j ≤ p | y ij | k ) /k , ¯ m x,k = max ≤ j ≤ p (¯ E | x ij | k ) /k , ¯ m y,k = max ≤ j ≤ p (¯ E | y ij | k ) /k , where ¯ E [ z i ] = P ni =1 E z i /n for a sequence of random variables { z i } ni =1 . Note that ¯ m x,k ≤ m x,k and¯ m y,k ≤ m y,k . Further define an indicator function, I := I ∆ = (cid:26) max ≤ j ≤ p | X j − e X j | ≤ ∆ , max ≤ j ≤ p | Y j − e Y j | ≤ ∆ (cid:27) , where e X = ( e X , . . . , e X p ) ′ = √ n P ni =1 e x i and e Y = ( e Y , . . . , e Y p ) ′ = √ n P ni =1 e y i . OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Proposition . Assume that √ βD n M xy / √ n ≤ with M xy = max { M x , M y } . Then we havefor any ∆ > , | E [ m ( X ) − m ( Y )] | . ( G + G β ) φ ( M x , M y ) + ( G + G β + G β ) D n √ n ( ¯ m x, + ¯ m y, )+ ( G + G β + G β ) D n √ n ( m x, + m y, ) + G ∆ + G E [1 − I ] , (9) where G k = sup z ∈ R | ∂ k g ( z ) /∂z k | for k ≥ . In addition, if √ βD n M xy / √ n ≤ , we can replace m x, + m y, by ¯ m x, + ¯ m y, in the above expression. The proof of Proposition 3.1 is adapted from that of Theorem 2.1 in [16] for i.i.d case.By approximating the indicator function I {· ≤ t } with a suitable smooth function g ( · ), Propo-sition 3.1 leads to an upper bound on the Kolmogorov distance, i.e., ρ n defined in (6). In fact, theupper bound in (9) can be further simplified using the self-normalization technique (see Lemma 3.1)and certain arguments under weak dependence assumption. Finally, by optimizing the simplifiedupper bound (see Theorem 3.1), we obtain various convergence rates for ρ n in Sections 3.2 – 3.3. Remark . In view of the proof of Proposition 3.1 (see e.g. (S.4)), the assumption that { y i } preserves the autocovariance structure of { x i } can be weakened by assuming that for all i, X k ∈ e N i E x i x ′ k = X k ∈ e N i E y i y ′ k . Thus { y i } is allowed to be a sequence of independent (mean-zero) p -dimensional Gaussian randomvariables such that cov( y i ) = P k ∈ e N i E x i x ′ k (provided that P k ∈ e N i E x i x ′ k is positive-definite). Remark . The arguments in the proof of Proposition 3.1 allow us to derive a non-asymptoticupper bound on E | m ∗ ( X ) − m ∗ ( Y ) | for a more general function m ∗ ( · ) on the high dimensional vectorsum (after some suitable componentwise transformation); see Section S.5. Such general results arepotentially useful in studying higher criticism test ([43]); see Example S.1 and Remark S.1.3.2. Dependence structure I: M -dependent time series. This subsection is devoted to the anal-ysis of M -dependent time series, which fits in the framework of dependency graph. Here, we allow M to grow slowly with the sample size n. Using the arguments in the proof of Proposition 3.1, weobtain the following result for M -dependent ( unnecessarily stationary ) sequence. Corollary . When { x i } is a M -dependent sequence, under the assumption that √ β (6 M +1) M xy / √ n ≤ , we have | E [ m ( X ) − m ( Y )] | . ( G + G β + G β ) (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ ( G + G β ) φ ( M x , M y ) + G ∆ + G E [1 − I ] . (10) X. ZHANG AND G. CHENG
Let n = ( N + M ) r , where N ≥ M and N, M, r → + ∞ as n → + ∞ . Define the block sums(11) A ij = iN +( i − M X l = iN +( i − M − N +1 x lj , B ij = i ( N + M ) X l = i ( N + M ) − M +1 x lj . It is not hard to see that { A ij } ri =1 and { B ij } ri =1 with 1 ≤ j ≤ p are two sequences of i.i.d randomvariables. Let V nj = q V nj + V nj with V nj = P ri =1 A ij and V nj = P ri =1 B ij . By generalizingTheorem 2.16 of de la Pe˜na et al (2009), we obtain the following lemma. Lemma . Suppose { x i } is a p -dimensional M -dependent sequence. Assume that there exist a j , b j > such that P n X i =1 x ij > a j ! ≤ / , P ( V nj > b j ) ≤ / . Then we have (12) P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 x ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ x ( a j + b j + V nj ) ! ≤ − x / , for any ≤ j ≤ p. In particular, we can choose b j = 4 E V nj and a j = 2 b j = 8 E V nj . It is worth noting that Lemma 3.1 holds without the stationarity assumption. This lemma isparticularly useful in controlling the last two terms in (10).Throughout the rest of this subsection, we consider the case where { x i } is a M -dependent stationary time series. Define γ x,jk ( l ) = E x j x (1+ l ) k for l ≥ γ x,jk ( l ) = γ x,kj ( − l ) for l <
0, where 1 ≤ j, k ≤ p. Let σ ( n ) j,k := σ ( n ) j,k ( M ) = P n − l =1 − n ( n − | l | ) γ x,jk ( l ) /n , σ j,k := σ j,k ( M ) = P + ∞ l = −∞ γ x,jk ( l ) and σ j = σ j ( M ) = P + ∞ l = −∞ | γ x,jj ( l ) | . Let ϕ ( M x ) := ϕ N,M ( M x ) be the smallestfinite constant which satisfies that uniformly for j , E ( A ij − ˘ A ij ) ≤ N ϕ ( M x ) σ j , E ( B ij − ˘ B ij ) ≤ M ϕ ( M x ) σ j , (13)where ˘ A ij and ˘ B ij are the truncated versions of A ij and B ij defined as follows:˘ A ij = iN +( i − M X l = iN +( i − M − N +1 ( x lj ∧ M x ) ∨ ( − M x ) , ˘ B ij = i ( N + M ) X l = i ( N + M ) − M +1 ( x lj ∧ M x ) ∨ ( − M x ) . Similarly, we can define the quantity ϕ ( M y ) for the Gaussian sequence { y i } . Set ϕ ( M x , M y ) = ϕ ( M x ) ∨ ϕ ( M y ). Further let u x ( γ ) and u y ( γ ) be the smallest quantities such that(14) P (cid:18) max ≤ i ≤ n max ≤ j ≤ p | x ij | ≤ u x ( γ ) (cid:19) ≥ − γ, P (cid:18) max ≤ i ≤ n max ≤ j ≤ p | y ij | ≤ u y ( γ ) (cid:19) ≥ − γ. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Building on the above results, we are ready to derive an upper bound for ρ n . To this end, considera “smooth” indicator function g ∈ C ( R ) : R → [0 ,
1] such that g ( s ) = 1 for s ≤ g ( s ) = 0for s ≥ . Fix any t ∈ R and define g ( s ) = g ( ψ ( s − t − e β )) with e β = β − log p . For this function g , G = 1, G . ψ , G . ψ and G . ψ . Here, ψ is a smoothing parameter we will choose carefullyin the proof. Corollary 3.1 and Lemma 3.1 imply the following result. Theorem . Consider a M -dependent stationary time series { x i } . Suppose √ β (6 M +1) M xy / √ n ≤ with M xy = max { M x , M y } , and M x > u x ( γ ) and M y > u y ( γ ) for some γ ∈ (0 , . Further supposethat there exist constants < c < c such that c < min ≤ j ≤ p σ ( n ) j,j ≤ max ≤ j ≤ p σ ( n ) j,j < c uniformlyholds for all large enough n , M and p . Then for any ψ > , ρ n = sup t ∈ R | P ( T X ≤ t ) − P ( T Y ≤ t ) | . ( ψ + ψβ ) φ ( M x , M y ) + ( ψ + ψ β + ψβ ) (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ ψϕ ( M x , M y ) σ j p p/γ ) + γ + ( e β + ψ − ) p ∨ log( pψ ) . We point out that the stationarity assumption is non-essential in the proof of Theorem 3.1.To characterize the dependence of M -dependent time series, we adopt the idea of viewing theweakly dependent time series as outputs on inputs in physical systems [40]. This framework isvery general and easy to verify for specific (linear or nonlinear) data-generating mechanism; see[41]. With some abuse of notation, let ǫ i be a sequence of mean-zero i.i.d random variables.Consider a physical system G ( . . . , ǫ i − , ǫ i ), where { ǫ i } are the inputs and G = ( G , . . . , G p ) ′ is a( p -dimensional) measurable function such that its output is well defined. Define the sigma field F M ( i ) = σ ( ǫ i − M , ǫ i − M +1 , . . . , ǫ i ) with M ≥ . We suppose the M -dependent sequence { x i } has thefollowing representation (also see the discussions in the next subsection), x i := x ( M ) i = E [ G ( . . . , ǫ i − , ǫ i ) |F M ( i )] := G ( M ) ( ǫ i − M , ǫ i − M +1 , . . . , ǫ i ) . For any l ∈ N , let x ( l − i = E [ x i | ǫ i +1 − l , . . . , ǫ i ] = E [ G ( . . . , ǫ i − , ǫ i ) |F l − ( i )] for l ≤ M , and x ( l − i = x i for l > M . By construction, x j and x ( l − l ) k are independent for any 1 ≤ j, k ≤ p .Let h : [0 , + ∞ ) → [0 , + ∞ ) be a convex and strictly increasing function with h (0) = 0. Denoteby h − ( · ) the inverse function of h ( · ) . Let l n := l n ( p, γ ) = log( pn/γ ) ∨ Assumption . Suppose one of the following two conditions holds: (i) E h (max ≤ j ≤ p | x ij | / D n ) ≤ with D n > , and n / M − / l − / n ≥ C max { D n h − ( n/γ ) , l / n } , n / M − l − / n ≥ C N, (15) for some constants C , C > ; (ii) max ≤ j ≤ p E exp( | x ij | / D n ) ≤ with D n > , and n / M − / l − / n ≥ C max { D n l n , l / n } , n / M − l − / n ≥ C N, (16) X. ZHANG AND G. CHENG for some constants C , C > . Theorem . Assume that there exist constants c , c , c > such that c < min ≤ j ≤ p σ ( n ) j,j ≤ max ≤ j ≤ p σ ( n ) j,j < c , max ≤ j ≤ p σ j < c , uniformly for all large enough M, p , and lim sup p max ≤ k ≤ p E |G k ( . . . , ǫ i − , ǫ i ) | < ∞ , (17) lim sup M,p max ≤ k ≤ p M X l =1 ( E | ( x (1+ l ) k − x ( l − l ) k ) | ) / < ∞ . (18) Condition (18) also holds for { y i } . Then under Assumption 3.1, we have ρ n = sup t ∈ R | P ( T X ≤ t ) − P ( T Y ≤ t ) | . n − / M / l / n + γ. (19)Suppose E (max ≤ j ≤ p | x ij | / D n ) ≤
1. Then with p . exp( n b ), M ≍ N . n b ′ , γ ≍ n − (1 − b ′ − b ) / = o (1), and D n . n (3 − b ′ − b ) / , we have Condition (i) in Assumption 3.1 holds with h ( x ) = x ,and(20) ρ n . n − (1 − b ′ − b ) / . If Condition (ii) in Assumption 3.1 holds, we can still have (20) when max ≤ j ≤ p E exp( | x ij | / D n ) ≤ p . exp( n b ), M ≍ N . n b ′ , γ ≍ n − (1 − b ′ − b ) / = o (1) and D n . n (3 − b ′ − b ) / .When b ′ = 0 (i.e. M = O (1)), our result allows p = O (exp( n b )) with b < /
7, which is consistentwith Corollary 2.1 in [16] for i.i.d random vectors (assuming that B n = O (1) therein). Remark . The sharpness of ρ n is not established in Theorem 3.2. However, the upper boundof ρ n given in (20) leads to two conjectures: (i) Gaussian approximation becomes less accuratewhen the data vectors are more dependent or the data dimension diverges at a faster rate; (ii)the less dependent of the data vectors, the faster diverging rate of the dimension is allowed forobtaining an accurate Gaussian approximation. The above phenomena will also be observed for theweakly dependent data in Section 3.3. Interestingly, we will show some empirical evidence of bothconjectures in that section. Remark . Assumption 3.1 and (17) impose tail restrictions on { x i } . Condition (18) requires { x i } to be weakly dependent uniformly as M grows, and, in particular, (18) allows us to quantify φ ( M x , M y ) and ϕ ( M x , M y ); see (S.10). OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Dependence structure II: Weakly dependent time series.
In this subsection, we extend theresults in Section 3.2 to the weakly dependent case, i.e., D n = n + 1. The key idea here is toapproximate the weakly dependent time series by a M -dependent time series, see the approximationerror (24) below.With slightly abuse of notation, suppose the sequence { x i } has the following causal representa-tion, x i := x ( ∞ ) i = G ( . . . , ǫ i − , ǫ i ) , (21)where G = ( G , . . . , G p ) ′ is a p -dimensional measurable function such that x i is well defined. To mea-sure the strength of dependence, we let { ǫ ′ i } be an i.i.d copy of { ǫ i } and x ∗ i = G ( . . . , ǫ − , ǫ ′ , ǫ , . . . , ǫ i ),and define(22) θ i,j,q ( x ) = ( E | x ij − x ∗ ij | q ) /q , Θ i,j,q ( x ) = + ∞ X l = i θ l,j,q ( x ) . In the subsequent discussions, we assume that the dependence measure sup ≤ j ≤ p Θ i,j,q ( x ) < ∞ forsome q >
0. Analogous quantity θ i,j,q ( y ) can be defined for the Gaussian sequence { y i } .Let x ( M ) i = ( x ( M ) i , . . . , x ( M ) ip ) ′ = E [ x i |F M ( i )] be the M -dependent approximation sequence for { x i } . Define X ( M ) in the same way as X by replacing x i with x ( M ) i . Because | m ( x ) − m ( y ) | ≤ G and | m ( x ) − m ( y ) | ≤ G max ≤ j ≤ p | x j − y j | (by the Lipschitz property of F β ), we have | E [ m ( X ) − m ( X ( M ) )] | ≤| E [( m ( X ) − m ( X ( M ) )) I M ] | + | E [( m ( X ) − m ( X ( M ) ))(1 − I M )] | . G ∆ M + G E [1 − I M ] , (23)where I M := I ∆ M ,M = { max ≤ j ≤ p | X j − X ( M ) j | ≤ ∆ M } for some ∆ M > M .Suppose max ≤ j ≤ p E | x ij | q < ∞ for some q >
0. By Lemma A.1 of [29], we have( E | X j − X ( M ) j | q ) q ′ /q ≤ C q n − q ′ / Θ q ′ M,j,q ( x ) , where q ′ = min(2 , q ) and C q is a positive constant depending on q . For any q ≥
2, we obtain E [1 − I M ] ≤ p X j =1 P ( | X j − X ( M ) j | ≥ ∆ M ) ≤ p X j =1 qM E | X j − X ( M ) j | q ≤ p X j =1 C q/ q Θ qM,j,q ( x )∆ qM = p X j =1 C q/ q ∆ qM + ∞ X l = M θ l,j,q ( x ) ! q . Optimizing the bound with respect to ∆ M in (23), we deduce that | E [ m ( X ) − m ( X ( M ) )] | . ( G G q ) / (1+ q ) p X j =1 Θ qM,j,q ( x ) / (1+ q ) , (24) X. ZHANG AND G. CHENG which along with (8) implies that | E [ g ( T X ) − g ( T X ( M ) )] | . ( G G q ) / (1+ q ) p X j =1 Θ qM,j,q ( x ) / (1+ q ) + β − G log p, with T X ( M ) = max ≤ j ≤ p P ni =1 x ( M ) ij / √ n .We give an explicit expression of the approximation error (24) in the following two examples. Example . Consider a stationary linear process, x ij = + ∞ X l =0 b lj ǫ ( i − l ) j , ≤ j ≤ p, where P + ∞ l =0 | b lj | < ∞ and ǫ i = ( ǫ i , . . . , ǫ ip ) ′ is a sequence of i.i.d random variables. Simple calcu-lation yields that X j − X ( M ) j = √ n P ni =1 P + ∞ l = M +1 b lj ǫ ( i − l ) j and θ l,j,q ( x ) = | b lj | ( E | ǫ j − ǫ ′ j | q ) /q . For q ≥ , we have | E [ m ( X ) − m ( X ( M ) )] | . ( G G q ) / (1+ q ) max ≤ j ≤ p ( E | ǫ j − ǫ ′ j | q ) / ( q +1) p X j =1 + ∞ X l = M | b lj | ! q / ( q +1) . Under the assumption that lim sup p max ≤ j ≤ p ( E | ǫ j | q ) /q < ∞ and b lj = ρ l with ρ < , we get | E [ m ( X ) − m ( X ( M ) )] | . ( G G q ) / (1+ q ) p / (1+ q ) ρ ( qM ) / (1+ q ) . Example . Consider a stationary Markov chain defined by an iterated random function x i = H ( x i − , e i ) . Here e i ’s are i.i.d. innovations, and H ( · , · ) is an R p -valued and jointly measurable function, whichsatisfies the following two conditions: (i) there exists some x such that E | H ( x , e ) | q < ∞ and(ii) ρ := sup x = x ′ ( E | H ( x, e ) − H ( x ′ , e ) | q ) / (2 q ) | x − x ′ | < , where | · | denotes the Euclidean norm for a p -dimensional vector. Then it can be shown that { x i } has the geometric moment contraction (GMC) condition property [42] and max ≤ j ≤ p Θ m,j, q ( x ) = O ( ρ m ) (see Example 2.1 in [12]). Hence | E [ m ( X ) − m ( X ( M ) )] | . ( G G q ) / (1+ q ) p / (1+ q ) ρ ( qM ) / (1+ q ) . We are now ready to present the main result . Recall that h ( · ) and l n are defined in Section 3.2. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Theorem . Suppose { x i } is a stationary time series which admits the representation (21).Assume that max ≤ j ≤ p E x ij < C , and c < min ≤ j ≤ p σ ( n ) j,j ≤ max ≤ j ≤ p σ ( n ) j,j < c , max ≤ j ≤ p σ j < c , (25) max ≤ k ≤ p j { θ j,k, ( x ) ∨ θ j,k, ( y ) } ≤ ℓ j with + ∞ X j =1 ℓ j < ∞ , (26) for some constants C , c > and < c < c . Suppose that there exist N and M such that N ≥ M and Assumption 3.1 is fulfilled. Then for q ≥ , we have ρ n . n − / M / l / n + γ + ( n / M − / l − / n ) q/ (1+ q ) p X j =1 Θ qM,j,q / (1+ q ) , (27) where Θ i,j,q = Θ i,j,q ( x ) ∨ Θ i,j,q ( y ) . The approximation parameter M will be chosen appropriately to optimize the bound (27). TheGaussian sequence { y i } can be constructed as a causal linear process (e.g. based on the Woldrepresentation theorem) to capture the second order property of { x i } .We note that the conditions in Theorem 3.3 can be categorized into two types: tail restriction andweak dependence assumption. Assumption 3.1 and the condition that max ≤ j ≤ p E x ij < C imposerestrictions on the tails of { x ij } pj =1 uniformly across j , while conditions (25)-(26) essentially requireweak dependence uniformly across all the components of { x i } . When max ≤ j ≤ p Θ M,j,q = O ( ρ M )for ρ <
1, we have ρ n . n − / M / l / n + γ + ( n / M − / l − / n ) q/ (1+ q ) p / (1+ q ) ρ ( qM ) / (1+ q ) . (28)Suppose p . exp( n b ) for some 0 ≤ b < / , and E (max ≤ j ≤ p | x ij | / D n ) ≤
1. Then by choosing M ≍ N . n b ′ with 4 b ′ + 7 b < > b ′ > b , γ ≍ n − (1 − b ′ − b ) / = o (1) and assuming that D n . n (3 − b ′ − b ) / , Condition (i) in Assumption 3.1 holds with h ( x ) = x , and ρ n . n − (1 − b ′ − b ) / . The same conclusion holds under condition (ii) in Assumption 3.1 provided that p . exp( n b ), M ≍ N . n b ′ , γ ≍ n − (1 − b ′ − b ) / = o (1) and D n . n (3 − b ′ − b ) / with 1 > b ′ > b .Below we provide some empirical evidence for two conjectures proposed in Remark 3.3, in par-ticular the interplay between dependence and dimensionality. To this end, we generate { x i } from amultivariate ARCH model x i = Σ / i ǫ i , where ǫ i = ( ǫ i , . . . , ǫ ip ) ′ with √ ǫ ij being a sequence of i.i.d t (4) random variables, and Σ i = (1 − β ) D p + β x i − x ′ i − with Σ / i being a lower triangular matrixbased on the Cholesky decomposition of Σ i . Here D p = ( d ij ) pi,j =1 with d jj = 1 and d ij = 0 . i = j. Notice that { x i } are uncorrelated and cov( x i ) = D p . To capture the second order property X. ZHANG AND G. CHENG of { x i } , we generate independent Gaussian vectors { y i } from N (0 , D p ). Figure 1 illustrates theinterplay between dependence and dimensionality using the P-P plots for n = 60, p = 100 , , β = 0 , . , .
5. For moderate p and β , the Gaussian approximation is reasonably good, whichis consistent with our theory. Moreover, we also observe the following phenomena. On one hand, as p increases, the approximation deteriorates for the same β which controls the strength of depen-dence; on the other hand, for fixed p , the approximation becomes worse in the right tail which ismost relevant for practical applications, as β increases. Note that our theoretical results are finitesample valid, and thus the sample size supposed not to play any role here. Hence, we believe thatthe less dependent of the data vectors, the faster diverging rate of the dimension is allowed forobtaining an accurate Gaussian approximation. . . . . . . t(4); n=60; beta=0 p=100p=300p=500 . . . . . . t(4); n=60; beta=0.2 p=100p=300p=500 . . . . . . t(4); n=60; beta=0.5 p=100p=300p=500 Fig 1 . Interplay between dependence and dimensionality: P-P plots comparing distributions of T X and T Y . In the end, we discuss an intriguing question: is there any so-called “dimension free dependencestructure”? In other words, what kind of dependence assumption will not affect the dimensionincrease rate (as compared to the independence case in [16])? To address this question, we considerone possibility: the original p -dimensional vector can be decomposed into two components namelyone times series component and one independence component, where the former component isasymptotically ignorable comparing to the latter as n grows. Our contribution here is to preciselycharacterize such a “dimension free” dependence structure. Proposition . Consider a p -dimensional time series { x i } . Suppose there exists a permuta-tion π ( · ) such that ( x iπ (1) , . . . , x iπ ( p ) ) = z ′ i = ( z ′ i , z ′ i ) ′ , where { z i } is a q -dimensional (possiblynonstationary) time series and { z i } is a p − q dimensional sequence of independent variables.Suppose { z i } and { z i } are independent. When { z i } satisfies the assumptions in Corollary 2.1 of[16], we have sup z ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max q +1 ≤ j ≤ p X π ( j ) ≤ z (cid:19) − P (cid:18) max q +1 ≤ j ≤ p Y π ( j ) ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . n − c , c > . (29) OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Recall that X π ( j ) is P ni =1 x iπ ( j ) / √ n and Y π ( j ) is defined in a similar manner. Then under theadditional assumption that qn − c + q/ E max q +1 ≤ j ≤ p Y j = O ( n − c ′ ) , c ′ > , (30) and max ≤ j ≤ p E | X j | < + ∞ , we have sup z ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ i ≤ p X i ≤ z (cid:19) − P (cid:18) max ≤ i ≤ p Y i ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . n − c ′′ , c ′′ > . (31)The additional assumption (30) implies that q is of a polynomial order w.r.t. n while ( p − q ) achievesthe exponential order as specified in Corollary 2.1 of [16]. Therefore, the largest possible divergingrate of p allowed in Proposition 3.2 remains the same as that in the independence case ([16]). Theindependence assumption between { z i } and { z i } might be relaxed. Here, we assume it mainly fortechnical simplicity so that only one single dependence assumption max ≤ j ≤ q E | X π ( j ) | < ∞ needsto be imposed on { z i } .
4. Bootstrap Inference.
In practice, the intrinsic dependence structure of time series datais usually unknown. Hence, the Gaussian approximation theory becomes too restrictive to use.However, this general theory provides a foundation in developing the bootstrap inference theorythat do not require such knowledge. In this section, we consider two types of bootstrap procedures:(i) blockwise multiplier bootstrap; and (ii) non-overlapping block bootstrap. The former is employedin Section 2, while the latter is a more flexible alternative.4.1.
Blockwise multiplier bootstrap.
To approximate the quantiles of T X , we introduce a block-wise multiplier bootstrap procedure for M -dependent and weakly dependent time series consideredin Sections 3.2 and 3.3. Suppose n = ( N + M ) r , where N ≥ M and N, M, r → + ∞ as n → + ∞ .Let { ( e i , e e i ) } be a sequence of i.i.d N (0 , I ) variables that are independent of { x i } . Define(32) T D = max ≤ j ≤ p √ n r X i =1 D ij , D ij = A ij e i + B ij e e i . Recall the definitions of A ij and B ij in (11). Conditional on { x i } , D ij are mean-zero Gaussianrandom variables such that(33) cov( D ij , D i ′ k ) = δ ii ′ ( A ij A i ′ k + B ij B i ′ k ) , δ ii ′ = { i = i ′ } . Thus we have(34) cov r X i =1 D ij / √ n, r X i =1 D ik / √ n ! = 1 n r X i =1 ( A ij A ik + B ij B ik ) . Conditional on the sample { x i } ni =1 , define the α -quantile of T D as(35) c T D ( α ) := inf { t ∈ R : P ( T D ≤ t |{ x i } ni =1 ) ≥ α } . X. ZHANG AND G. CHENG
Our goal below is to quantify(36) e ρ n := sup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . To this end, consider the estimation errors E A := max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r r X i =1 A ij A ik /N − σ ( n ) j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,E B := max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r r X i =1 B ij B ik /M − σ ( n ) j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,E AB := max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 ( A ij A ik + B ij B ik ) − σ ( n ) j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (37)where σ ( n ) j,k = n P n − l =1 − n ( n − | l | ) γ x,jk ( l ). Recall that h ( · ) is a nondecreasing convex function with h (0) = 0. Define the Orlicz norm as || X || h = inf (cid:26) B > E h (cid:18) | X | B (cid:19) ≤ (cid:27) . We first consider M -dependent stationary sequence where M is allowed to grow with the samplesize n . Define the following quantities which characterize the higher order properties of the timeseries (e.g., ¯ σ x,N and ς x,N below characterize the fourth order property of { x i } ),¯ σ x,N = max ≤ j ≤ p N + ∞ X i ,i ,i = −∞ | cum( x i j , x i j , x i j , x j ) | + σ j ,ς x,N = E max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 x ij / √ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / ,ζ x,h,N = max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 x ij / √ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h , ̟ x = max ≤ j,k ≤ p + ∞ X l = −∞ | l || E x i,j x i + l,k | , where cum denotes the cumulant (see e.g. [5]) and σ j = P + ∞ l = −∞ | γ x,jj ( l ) | .The following lemma plays an important role in the subsequent derivations. Lemma . Suppose { x i } is a M -dependent stationary sequence. Then with h ( x ) = exp( x ) − , E E A . ¯ σ x,N p log p/r + log p { log( rp ) } ζ x,h,N /r + ̟ x /N, E E B . ¯ σ x,M p log p/r + log p { log( rp ) } ζ x,h,M /r + ̟ x /M. Alternatively, we have E E A . ¯ σ x,N p log p/r + log pς x,N / √ r + ̟ x /N, E E B . ¯ σ x,M p log p/r + log pς x,M / √ r + ̟ x /M. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Let c T Y ( α ) = inf { t ∈ R : P ( T Y ≤ t ) ≥ α } . In the spirit of Lemma 3.2 in [16], we can show thatwhen c < min ≤ j ≤ p σ ( n ) j,j ≤ max ≤ j ≤ p σ ( n ) j,j < c for some 0 < c < c ,P ( c T D ( α ) ≤ c T Y ( α + π ( ν ))) ≥ − P ( E AB > ν ) ,P ( c T Y ( α ) ≤ c T D ( α + π ( ν ))) ≥ − P ( E AB > ν ) , where π ( ν ) = Cν / (1 ∨ log( p/ν )) / for some constant C > c , c . Using thearguments in Theorem 3.1 of [16], it is not hard to show that(38) sup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . ρ n + π ( ν ) + P ( E AB > ν ) . Because E E AB ≤ E E A + E E B , we deduce that e ρ n := sup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . ρ n + ν / (1 ∨ log( p/ν )) / + E E A /ν + E E B /ν. (39) Assumption . Suppose p . exp( n b ) with ≤ b < / . Set M . n b ′ and N . n b ′′ with > b ′′ ≥ b ′ , b ′ + 7 b < and b ′ > b . Assume that D n . n (3 − b ′ − b ) / under Condition (i) inAssumption 3.1 with h ( x ) = x or D n . n (3 − b ′ − b ) / under Condition (ii) in Assumption 3.1.Further assume that one of the following two conditions holds. Condition 1: ¯ σ x,M ∨ ¯ σ x,N . n s , ζ x,h,M ∨ ζ x,h,N . n s / , ̟ x . n s , where h ( x ) = exp( x ) − and s , s , s satisfy that s b := ((1 − b − b ′′ ) / − s ) ∧ (1 − b − b ′′ − s ) ∧ ( b ′ − b − s ) > . Condition 2: ¯ σ x,M ∨ ¯ σ x,N . n s , ς x,M ∨ ς x,N . n s ′ / , ̟ x . n s , and s , s , s satisfy that s ′ b := ((1 − b − b ′′ ) / − s ) ∧ ((1 − b − b ′′ ) / − s ′ ) ∧ ( b ′ − b − s ) > . We are now in position to present the first main result in this section.
Theorem . Consider a M -dependent stationary time series { x i } . Under the assumptions inTheorem 3.2 and Assumption 4.1, (40) sup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . n − c , c = min { s b / , (1 − b ′ − b ) / } , under Condition 1 ,n − c ′ , c ′ = min { s ′ b / , (1 − b ′ − b ) / } , under Condition 2 . Our next theorem extends the above result to weakly dependent stationary time series. X. ZHANG AND G. CHENG
Theorem . Consider a weakly dependent stationary time series { x i } . Suppose max ≤ j ≤ p Θ M,j,q = O ( ρ M ) for ρ < and some q ≥ . Then under the assumptions in Theorem 3.3 and Assumption4.1, (41) sup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . n − c , c = min { s b / , (1 − b ′ − b ) / } , under Condition 1 ,n − c ′ , c ′ = min { s ′ b / , (1 − b ′ − b ) / } , under Condition 2 . Remark that the results of Theorems 4.1 and 4.2 are still valid even when p is fixed or p growsslower than the exponential rate required in Assumption 4.1. Remark . When { x i } has the so-called geometric moment contraction (GMC) property(uniformly across its components), we have ¯ σ x,M ∨ ¯ σ x,N . s = 0) by Proposition 2 of [42]and the assumption that max j σ j < ∞ . Remark . It is known that in the low dimensional setting, the tapered block bootstrapmethod yields an improvement over the block bootstrap in terms of the bias for variance estimation,and thus provides a better MSE rate; see [31]. Hence, we may also want to combine the blockwisemultiplier bootstrap method proposed here with the data tapering scheme. For example, let K : R → R be a data taper with K ( x ) = 0 for x / ∈ [0 , T K ,D = max ≤ j ≤ p √ n r X i =1 D K ,ij , D K ,ij = A K ,ij e i + B K ,ij e e i ,A K ,ij = iN +( i − M X l =( i − N + M )+1 K (cid:18) l − ( i − N + M ) N (cid:19) x lj ,B K ,ij = i ( N + M ) X l = iN +( i − M +1 K (cid:18) l − iN + ( i − MM (cid:19) x lj . More detailed investigation along this direction is left for future study.4.2.
Non-Overlapping Block bootstrap.
In this subsection, we propose an alternative bootstrapprocedure in the high dimensional setting: non-overlapping block bootstrap ([10]). In general, thisbootstrap procedure may avoid estimating the influence function (defined in Section 5) in contrastwith blockwise multiplier bootstrap. We provide theoretical justifications for this procedure throughestablishing its equivalence with multiplier bootstrap; see (42).Assume for simplicity that n = b n l n , where b n , l n ∈ Z . Conditional on the sample { x i } ni =1 , we let ̺ , . . . , ̺ l n be i.i.d uniform random variables on { , . . . , l n − } and define x ∗ ( j − b n + i = x ̺ j b n + i with OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES ≤ j ≤ l n and 1 ≤ i ≤ b n . In other words, { x ∗ i } ni =1 is a non-overlapping block bootstrap samplewith block size b n . Define T X ∗ = max ≤ j ≤ p √ n n X i =1 ( x ∗ ij − ¯ x nj ) = max ≤ j ≤ p √ n l n X i =1 ( A ∗ ij − ¯ A nj ) , where ¯ x nj = P ni =1 x ij /n , ¯ A nj = P l n i =1 A ij /l n , and A ij = P ib n l =( i − b n +1 x lj , and A ∗ j , . . . , A ∗ l n j arei.i.d draws from the empirical distribution of A j , . . . , A l n j . Also define T e X = max ≤ j ≤ p √ n l n X i =1 A ij e i , where { e i } l n i =1 is a sequence of i.i.d N (0 , b n ≥ M. The theoretical validity of the multiplier bootstrap based on T e X can be justified usingsimilar arguments in the previous subsection because the same arguments go through when A ij and B ij are replaced by A ij (provided that b n ≥ M ). By showing that with probability 1 − Cn − c ,(42) sup t ∈ R | P ( T X ∗ ≤ t |{ x i } ni =1 ) − P ( T e X ≤ t |{ x i } ni =1 ) | . n − c ′ , c ′ > . we establish the validity of non-overlapping block bootstrap in Theorem 4.3. Assumption . Assume that ¯ σ x,b n p log p/l n . n − c and ζ x,h,b n { log( pl n ) } /l n . n − c ′ with h ( x ) = exp( x ) − , where c , c ′ > . Theorem . Suppose that c < min ≤ j ≤ p σ ( b n ) j,j ≤ max ≤ j ≤ p σ ( b n ) j,j < c and max ≤ j ≤ p σ j < c for some constants < c < c < ∞ and c > , where σ ( b n ) j,j = P b n − l =1 − b n ( b n − | l | ) γ x,jj ( l ) /b n .Further assume that the assumptions in Theorem 4.1 or Theorem 4.2 hold with M = N = b n and r = n/ (2 b n ) . Then (42) holds with probability − Cn − c for some c, C > . Moreover, we have (43) sup α ∈ (0 , (cid:12)(cid:12)(cid:12) P ( T X ≤ c T ∗ X ( α )) − α (cid:12)(cid:12)(cid:12) . n − c ′′ , where c T X ∗ ( α ) = inf { t ∈ R : P ( T X ∗ ≤ t |{ x i } ni =1 ) ≥ α } and c ′′ > .
5. General Inferential Theory.
In this section, we establish a general framework of con-ducting bootstrap inference for high dimensional time series based on the theoretical results inSection 4. This general framework assumes that the q -dimensional quantity of interest, denotedas Θ q , admits an approximately linear expansion, and thus covers three examples considered inSection 2. In particular, Θ q is expressed as a functional of the distribution of a p -dimensional weakly dependent stationary time series { u i } . Motivated by the testing on spectral properties, wefurther extend the results in Section 5.1 to an infinite dimensional parameter case in Section 5.2. Note that p here is different from the dimension of x i discussed in previous sections. X. ZHANG AND G. CHENG
Approximately linear statistics.
In this subsection, we consider the quantities that can beexpressed as functionals of the marginal distribution of a block time series with length d : { v i } N i =1 ,where v i := ( u i , . . . , u i + d − ) ′ and N = n − d + 1. Here, we allow the integer d to grow with n .Define F N d = P N i =1 δ v i /N as the empirical distribution for { v i } N i =1 . The distribution function of v is denoted as F d . We are interested in testing the parameter Θ q = ( θ , . . . , θ q ) ′ := T ( F d ) forsome functional T := T q ,d . The parameter dimension q depends on either p or d , e.g., q = p, p or d p . A natural estimator for Θ q is then given by b Θ q = ( b θ , . . . , b θ q ) ′ := T ( F N d ).Assume b Θ q admits the following approximately linear expansion in a neighborhood of F d :(44) b Θ q = Θ q + 1 N N X i =1 IF ( v i , F d ) + R N , where IF ( v i , F d ) = ( IF ( v i , F d ) , . . . , IF q ( v i , F d )) ′ is called “influence function” (see e.g. [21]) and R N := R N ( v , . . . , v N ) = ( R N , . . . , R q N ) ′ is a remainder term. Examples of approximatelylinear statistics include various location and scale estimators for the marginal distribution of { u i } ,von Mises statistics and M -estimators of time series models (see [24]).We are interested in testing the null hypothesis H : Θ q = e Θ q versus the alternative H a : Θ q = e Θ q , where e Θ q = ( e θ , . . . , e θ q ) ′ . The test is proposed as(45) φ ( b Θ q ; c ( α )) = , max ≤ j ≤ q √ N | b θ j − e θ j | ≥ c ( α ) , , otherwise . We next apply the bootstrap theory in Section 4 to obtain the critical value c ( α ). Specifically, we de-fine x i = ( IF ( v i , F d ) ′ , − IF ( v i , F d ) ′ ) ′ and b x i = ( c IF ( v i , F N d ) ′ , − c IF ( v i , F N d ) ′ ) ′ , where c IF ( v i , F N d )is some estimate of IF ( v i , F d ). Suppose N = ( N + M ) r , where N ≥ M and N , M , r → + ∞ as N → + ∞ . Define the estimated block sums(46) b A ij = iN +( i − M X l =( i − N + M )+1 b x lj , b B ij = i ( N + M ) X l = iN +( i − M +1 b x lj , where 1 ≤ i ≤ r and 1 ≤ j ≤ q . Let T b D = max ≤ j ≤ q √ n r X i =1 b D ij , where b D ij = b A ij e i + b B ij e e i with { ( e i , e e i ) } being a sequence of i.i.d N (0 , I ) independent of { u i } . Thebootstrap critical value is given by(47) c ( α ) := inf { t ∈ R : P ( T b D ≤ t |{ x i } ni =1 ) ≥ − α } . We next justify the validity of the test in (45) with c ( α ) = c ( α ) in Theorems 5.1 and 5.2. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Assumption . Assume that P (max ≤ j ≤ q √ N |R jN | > C n − c / p log(2 q )) < C n − c and P ( E AB { log(2 q ) } > C n − c ) ≤ C n − c , where c , C , c , C > , and E AB = max ≤ j ≤ q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 { ( A ij − b A ij ) + ( B ij − b B ij ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , with A ij = P iN +( i − M l =( i − )( N + M )+1 x lj and B ij = P i ( N + M ) l = iN +( i − M +1 x lj . Theorem . Suppose the assumptions in Theorem 4.1 or Theorem 4.2 hold for { x i } , where p is replaced by q . Then under Assumption 5.1 and H , we have (48) sup α ∈ (0 , (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ q p N | b θ j − e θ j | ≥ c ( α ) (cid:19) − α (cid:12)(cid:12)(cid:12)(cid:12) . n − c , c > . Theorem 5.1 applies directly to the methods described in Sections 2.1-2.2 for both M-dependentand weakly dependent stationary time series. For example, consider the white noise testing problemin Section 2.2. Suppose E u i = 0. In this example, Θ q = (vec( γ u (1)) ′ , . . . , vec( γ u ( L )) ′ ) ′ with γ u ( h ) = E u i u ′ i + h and q = Lp . Then we have IF ( v i , F d ) = ν i − Θ q and c IF ( v i , F N d ) = ν i − P N i =1 ν i /n with ν i = (vec( u i u ′ i +1 ) ′ , . . . , vec( u i u ′ i + L ) ′ ) ′ and N = n − L . Note that the bootstrap proceduresconsidered in Section 2 are in fact simplified versions of the blockwise multiplier bootstrap in Section4 with N = M = b n and r = l n / T band = max | j − k |≥ ι √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( u ij u ik ) / qb γ u,jj (0) b γ u,kk (0) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where b γ u,jk (0) = P ni =1 u ij u ik /n . With some abuse of notation, let x i = ( e u i e u i , . . . , e u i e u ip , . . . , e u ip e u i ,. . . , e u ip e u ip ) with e u ij = u ij / p γ u,jj (0). Theorem . Suppose the assumptions in Theorem 4.1 or Theorem 4.2 hold for { x i } , where p is replaced by the cardinality of the set { ≤ j, k ≤ p : | j − k | ≥ ι } . Then under Assumption S.1 inthe supplementary material and H , we have (49) sup α ∈ (0 , | P ( T band ≥ c band ( α )) − α | . n − c , c > , where c band ( α ) is given in Section 2.3. The proof of Theorem 5.2 is similar as that of Theorem 5.1, and thus skipped. In Section S.4,we show that Assumption S.1 can be verified under suitable primitive conditions.To avoid direct estimation of the influence function, we may alternatively apply the non-overlappingblock bootstrap procedure in Section 4.2. Assume for simplicity that N = b n l n , where b n , l n ∈ Z .Let ̺ , . . . , ̺ l n be i.i.d uniform random variables on { , . . . , l n − } and define v ∗ ( j − b n + i = v ̺ j b n + i X. ZHANG AND G. CHENG with 1 ≤ j ≤ l n and 1 ≤ i ≤ b n . Compute the block bootstrap estimate b Θ ∗ q based on the bootstrapsample { v ∗ i } N i =1 . Let c ( α ) be the 100(1 − α )th quantile of the distribution of max ≤ j ≤ q √ N | b θ ∗ j − b θ j | conditional on the sample { u i } . In what follows, we further justify the validity of the non-overlappingblock bootstrap in the same framework. Assumption . Assume that P (cid:18) P (cid:18)p N max ≤ j ≤ q |R ∗ jN − R jN | > C n − c / p log(2 q ) (cid:12)(cid:12)(cid:12)(cid:12) { u i } ni =1 (cid:19) > C n − c (cid:19) ≤ C n − c , where R ∗ N = ( R ∗ N , . . . , R ∗ q N ) = R N ( v ∗ , . . . , v ∗ N ) , and c , C , c , C > . Theorem . Suppose the assumptions in Theorem 4.3 hold for { x i } , where p is replaced by q . Then under Assumptions 5.1-5.2, we have (50) (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ q p N | b θ j − e θ j | ≥ c ( α ) (cid:19) − α (cid:12)(cid:12)(cid:12)(cid:12) . n − c , c > . Remark . An alternative way to construct the uniform confidence band or perform hypoth-esis testing is based on the studentized statistic. For example, let b σ j be a consistent estimator of lim n →∞ N var ( b θ j ) . Then the uniform confidence band can be constructed as (cid:26) Θ q = ( θ , . . . , θ q ) ′ ∈ R q : max ≤ j ≤ q p N (cid:12)(cid:12)(cid:12)b θ j − θ j (cid:12)(cid:12)(cid:12) / b σ j ≤ ˇ c ( α ) (cid:27) . The blockwise multiplier bootstrap or non-overlapping block bootstrap can be modified accordingly toobtain the critical value ˇ c ( α ) . Extension to infinite dimensional parameters.
To broaden the applicability of our method,we extend the above results to cover infinite dimensional parameters that are functionals of the jointdistribution of { u i } i ∈ Z , denoted as F ∞ . A typical example is the spectral quantities that dependon the distribution of the whole time series rather than any finite dimensional distribution; seeExample 5.1. Hence, the extension in this section is useful in conducting inference for the spectrumof high dimensional time series.Suppose Θ q = ( θ , . . . , θ q ) ′ = T ∞ ( F ∞ ) and its estimator is b Θ q := b Θ q ( u , . . . , u n ) = ( b θ , . . . , b θ q ) ′ .Again, q is allowed to grow with n or p . Assume that there exists a sequence of approximatingstatistics for b Θ q that is a functional of ϑ n -dimensional empirical distribution, and a sequence ofapproximating (non-random) quantities ¯Θ q = (¯ θ , . . . , ¯ θ q ) ′ for Θ q . Then our bootstrap methodas proposed in Section 5.1 still works provided that these two approximation errors can be wellcontrolled and similar regularity conditions hold for the expansion of the approximating statisticsaround ¯Θ q , i.e., (51). To be more precise, we impose the following assumption. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Assumption . For a sequence of positive integers ϑ n that grow with n, let v i,ϑ n = ( u i , . . . , u i + ϑ n − ) with i = 1 , , . . . , N ,ϑ n := n − ϑ n + 1 . Assume the expansion, T ϑ n ( F N ,ϑn ϑ n ) :=( T ,ϑ n ( F N ,ϑn ϑ n ) , . . . , T q ,ϑ n ( F N ,ϑn ϑ n )) ′ = ¯Θ q + 1 n N ,ϑn X i =1 IF ( v i,ϑ n , F ϑ n ) + R N ,ϑn , (51) where R N ,ϑn = ( R ,N ,ϑn , . . . , R q ,N ,ϑn ) ′ is a remainder term. Denote Υ j,ϑ n = |R j,N ,ϑn | + | b θ j −T j,ϑ n ( F N ,ϑn ϑ n ) | . Suppose that P (cid:18) max ≤ j ≤ q p N Υ j,ϑ n > C n − c / p log(2 q ) (cid:19) < C n − c , and n c p log(2 q ) max ≤ j ≤ q √ N | ¯ θ j − θ j | = o (1) for some c , C > . We next illustrate the validity of expansion (51) using a spectral mean example.
Example . Consider the spectral mean G ( F u , φ ) = R π − π tr ( φ ( λ ) F u ( λ )) dλ , where tr denotes thetrace of a square matrix, F u ( · ) is the spectral density of { u i } and φ ( · ) : [ − π, π ] → R p × p . For sim-plicity, assume that E u i = 0 . Suppose the quantity of interest is Θ = ( G ( F u , φ ) , . . . , G ( F u , φ q )) ′ with φ k ( · ) : [ − π, π ] → R p × p for ≤ k ≤ q . Here Θ can be interpreted as the projection of thespectral density matrix onto q directions defined by φ k ( · ) with ≤ k ≤ q . A sample analogue of F u ( λ ) is the periodogram I n,u ( λ ) = (2 πn ) − P ni,j =1 u i u ′ j exp( ı ( i − j ) λ ) with ı = √− . Then a plug-inestimator for Θ q is given by b Θ q = ( G ( I n,u , φ ) , . . . , G ( I n,u , φ q )) ′ . Letting b Γ n,h = P n − hj =1 u j + h u ′ j /n ,then G ( I n,u , φ k ) = P n − h =1 − n tr ( e φ hk b Γ n,h ) with e φ hk = R π − π φ k ( λ ) exp( ıhλ ) dλ/ (2 π ) . Consider the ap-proximating quantity ¯ θ j = P ϑ n − h =1 − ϑ n tr ( e φ hk Γ h ) with Γ h = E u j + h u ′ j . It is then straightforward to seethat (52) T ϑ n ( F N ,ϑn ϑ n ) := ϑ n − X h =1 − ϑ n tr ( e φ hk b Γ h ) = ¯ θ j + 1 n N ,ϑn X i =1 IF ( v i,ϑ n , F ϑ n ) + R j,N ,ϑn , where IF ( v i,ϑ n , F ϑ n ) = P ϑ n − h =1 − ϑ n tr { e φ hk ( u i + h u ′ i − Γ h ) } and R j,N ,ϑn is the corresponding remainderterm. Recall that Θ q = T ∞ ( F ∞ ) with F ∞ being the joint distribution of { u i } i ∈ Z . The statistic fortesting the null hypothesis H : Θ q = e Θ q versus the alternative H a : Θ q = e Θ q , where e Θ q =( e θ , . . . , e θ q ) ′ , is given by(53) max ≤ j ≤ q p N | b θ j − e θ j | ≥ c ( α ) . With some abuse of notation, we now define x i := x in = ( IF ( v i,ϑ n , F ϑ n ) ′ , − IF ( v i,ϑ n , F ϑ n ) ′ ) ′ and b x i = ( c IF ( v i,ϑ n , F N ,ϑn ϑ n ) ′ , − c IF ( v i,ϑ n , F N ,ϑn ϑ n ) ′ ) ′ with c IF ( v i,ϑ n , F N ,ϑn ϑ n ) being some estimate of X. ZHANG AND G. CHENG IF ( v i,ϑ n , F ϑ n ) (note that in this case { x in } N ,ϑn i =1 is an array). Suppose N ,ϑ n = ( N ,ϑ n + M ,ϑ n ) r ,ϑ n .We can define b A ij and b B ij in a similar way as before (see (46)), where 1 ≤ i ≤ r ,ϑ n and 1 ≤ j ≤ q .Let T b D = max ≤ j ≤ q √ n r ,ϑn X i =1 b D ij , where b D ij = b A ij e i + b B ij e e i with { ( e i , e e i ) } being a sequence of i.i.d N (0 , I ) independent of { u i } . Thebootstrap critical value is then given by(54) c ( α ) := inf { t ∈ R : P ( T b D ≤ t |{ x i } ni =1 ) ≥ − α } . Following the arguments in the proof of Theorem 5.1, we obtain the following result.
Theorem . Suppose Assumption 5.3 holds and the assumptions in Theorem 4.1 or Theorem4.2 are satisfied for { x i } , where p is replaced by q . Assume in addition that P ( E AB { log(2 q ) } >C n − c ) ≤ C n − c , where c , C > , and E AB = max ≤ j ≤ q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 { ( A ij − b A ij ) + ( B ij − b B ij ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , with A ij = P iN ,ϑn +( i − M ,ϑn l = iN ,ϑn +( i − M ,ϑn − N ,ϑn +1 x lj and B ij = P i ( N ,ϑn + M ,ϑn ) l = i ( N ,ϑn + M ,ϑn ) − M ,ϑn +1 x lj . Then wehave for some c > , (55) sup α ∈ (0 , (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ q p N | b θ j − e θ j | ≥ c ( α ) (cid:19) − α (cid:12)(cid:12)(cid:12)(cid:12) . n − c . References. [1]
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica Arlot, S., Blanchard, G. and Roquain, E. (2010). Some non-asymptotic results on resampling in highdimension I: confidence regions.
Ann. Statist. Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension.
J. Statist. Plann. Infer.
Box, G. E. P. and Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integratedmoving average time series models.
J. Amer. Statist. Assoc. Brillinger, D. R. (1975).
Time Series: Data Analysis and Theory . San Francisco: Holden-Day.[6]
B¨uhlmann, P. and K¨unsch, H. R. (1999). Block length selection in the bootstrap for time series.
Comput. Stat.Data An. Cai, T. T., Liu, W. D. and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery inhigh-dimensional and sparse settings.
J. Am. Statist. Assoc.
Cai, T. T., Liu, W. D. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence.
J.R. Stat. Soc. Ser. B Stat. Methodol. Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testingcovariance structure and construc tion of compressed sensing matrices.
Ann. Statist. [10] Carlstein, E. (1986). The use of subseries values for estimating the variance of a general statistic from astationary sequence.
Ann. Statist. Chatterjee, S. (2005). An error bound in the Sudakov-Fernique inequality. arXiv:math/0510424.[12]
Chen, X., Xu, M. and Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensionaltime series.
Ann. Statist. Chen, S. X. and Qin, Y.-L. (2010). A two sample test for high dimensional data with applications to gene-settesting.
Ann. Statist. Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices.
J. Amer.Statist. Assoc.
Chernozhukov, V., Chetverikov, D. and Kato, K. (2012). Comparison and anticoncentration bounds formaxima of Gaussian random vectors.
Probab. Theory Relat. Fields to appear.[16]
Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier boot-strap for maxima of sums of high-dimensional random vectors.
Ann. Statist. de la Pe˜na, V., Lai, T. and Shao, Q.-M. (2009). Self-Normalized Processes: Limit Theory and StatisticalApplications . Springer.[18]
Deo, R. S. (2000). Spectral tests for the martingale hypothesis under conditional heteroscedasticity.
J. Econo-metrics Hall, P., Horowitz, J. L. and Jing, B.-Y. (1995). On blocking rules for the bootstrap with dependent data.
Biometrika Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise.
Ann.Statist. Hampel, F., Ronchetti, E., Rousseeuw, P. and Stahel, W. (1986).
Robust Statistics: The Approach Basedon Influence Functions . New York: John Wiley.[22]
Hong, Y. (1996). Consistent testing for serial correlation of unknown form.
Econometrica Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes.
Ann. Statist. K¨unsch, H. (1989). The jackknife and the bootstrap for general stationary observations.
Ann. Statist. Ledoux, M. (2001).
Concentration of Measure Phenomenon . American Mathematical Society.[26]
Li, J. and Chen, S. X. (2012). Two sample tests for high dimensional covariance matrices.
Ann. Statist. Ligeralde, A. and Brown, B. (1995). Band covariance matrix estimation using restricted residuals: A MonteCarlo analysis.
Internat. Econom. Rev. Liu, R. Y. and Singh, K. (1992). Moving block jackknife and bootstrap capture weak dependence. In
Exploringthe Limits of Bootstrap , Ed. R. LePage and L. Billard, pp. 225-248. New York: John Wiley.[29]
Liu, W. and Lin, Z. (2009). Strong approximation for a class of stationary processes.
Stochastic Process. Appl.
Liu, W., Lin, Z.Y. and Shao, Q.-M. (2008). The asymptotic distribution and Berry-Esseen bound of a newtest for independence in high dimension with an application to stochastic optimization.
Ann. Appl. Probab. Paparoditis, E. and Politis, D. N. (2001). Tapered block bootstrap.
Biometrika Politis, D. N., Romano, J. P. and Wolf, M. (1999).
Subsampling , Springer-Verlag, New York.[33]
Qiu, Y-M. and Chen, S. X. (2012). Test for bandedness of high dimensional covariance matrices with bandwidthestimation,
Ann. Statist. Robinson, P. M. (1991). Testing for strong serial correlation and dynamic conditional heteroskedasticity in X. ZHANG AND G. CHENGmultiple regression.
J. Econometrics R¨ollin, A. (2011). Stein’s method in high dimensions with applications.
Ann. Inst. H. Poincar´e Probab. Statist. Stein, C. (1986).
Approximate computation of expectations . Institute of Mathematical Statistics Lecture Notes,Monograph Series, 7.[37]
Tao, M., Wang, Y., Yao, Q. and Zou, J. (2011). Large volatility matrix inference via combining low-frequencyand high-frequency approaches.
J. Amer. Statist. Assoc. van der Vaart, A. W. and Wellner, J. A. (1996).
Weak Convergence and Empirical Processes . SpringerVerlag, New York.[39]
Wikle, C. K. and Hooten, M. B. (2010). A general science-based framework for dynamical spatio-temporalmodels.
TEST Wu, W. B. (2005). Nonlinear system theory: Another look at dependence.
Proc. Natl. Acad. Sci. USA
Wu, W. B. (2011). Asymptotic theory for stationary processes.
Stat. Interface Wu, W. B. and Shao, X. (2004). Limit theorems for iterated random functions.
J. Appl. Probab. Zhong, P.-S., Chen, S. X. and Xu, M. (2013). Tests alternative to higher criticism for high dimensional meansunder sparsity and column-wise dependence.
Ann. Statist. Supplementary Material
Throughout the supplementary material, define the generic constants C and C ′ that are inde-pendent of n and p . For a set A , denote by |A| its cardinality.S.1. Proofs of the main results in Section 3.
Proof of Proposition 3.1.
Define Z ( t ) = P ni =1 Z i ( t ) with the Slepian interpolation Z i ( t ) =( √ t e x i + √ − t e y i ) / √ n and 0 ≤ t ≤ . Let Ψ( t ) = E m ( Z ( t )) . Define V ( i ) ( t ) = P j ∈ e N i Z j ( t ) and Z ( i ) ( t ) = Z ( t ) − V ( i ) ( t ). Write ∂ j m ( x ) = ∂m ( x ) /∂x j , ∂ jk m ( x ) = ∂ m ( x ) /∂x j ∂x k and ∂ jkl m ( x ) = ∂ m ( x ) /∂x j ∂x k ∂x l for j, k, l = 1 , , . . . , p , where x = ( x , x , . . . , x p ) ′ . Note that E m ( e X ) − E m ( e Y ) =Ψ(1) − Ψ(0) = Z Ψ ′ ( t ) dt = 12 n X i =1 p X j =1 Z E [ ∂ j m ( Z ( t )) ˙ Z ij ( t )] dt = 12 ( I + I + I ) , (S.1)where ˙ Z ij ( t ) = { e x ij / √ t − e y ij / √ − t } / √ n, and I = n X i =1 p X j =1 Z E [ ∂ j m ( Z ( i ) ( t )) ˙ Z ij ( t )] dt,I = n X i =1 p X k,j =1 Z E [ ∂ k ∂ j m ( Z ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t )] dt,I = n X i =1 p X k,l,j =1 Z Z (1 − τ ) E [ ∂ l ∂ k ∂ j m ( Z ( i ) ( t ) + τ V ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t )] dtdτ. (S.2)Using the fact that Z ( i ) ( t ) and ˙ Z ij ( t ) are independent, and E ˙ Z ij ( t ) = 0, we have I = 0 . Tobound the second term, define the expanded neighborhood around N i , N i = { j : { j, k } ∈ E n for some k ∈ N i } , and Z ( i ) ( t ) = Z ( t ) − P l ∈N i ∪ e N i Z l ( t ) = Z ( i ) ( t ) −V ( i ) ( t ), where V ( i ) ( t ) = P l ∈N i \ e N i Z l ( t ) with N i \ e N i = X. ZHANG AND G. CHENG { k ∈ N i : k / ∈ e N i } . By Taylor expansion, we have I = n X i =1 p X k,j =1 Z E [ ∂ k ∂ j m ( Z ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t )] dt + n X i =1 p X k,j,l =1 Z Z E [ ∂ k ∂ j ∂ l m ( Z ( i ) ( t ) + τ V ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t )] dtdτ = n X i =1 p X k,j =1 Z E [ ∂ k ∂ j m ( Z ( i ) ( t ))] E [ ˙ Z ij ( t ) V ( i ) k ( t )] dt + n X i =1 p X k,j,l =1 Z Z E [ ∂ k ∂ j ∂ l m ( Z ( i ) ( t ) + τ V ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t )] dtdτ = I + I , where we have used the fact that ˙ Z ij ( t ) V ( i ) k ( t ) and Z ( i ) ( t ) are independent.Let M xy = max { M x , M y } . By the assumption that 2 √ βD n M xy / √ n ≤ , max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈N i ∪ e N i Z lj ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ p X l ∈N i ∪ e N i | Z lj ( t ) | ≤ D n sup t ∈ [0 , (2 √ t + √ − t ) M xy / √ n ≤√ D n M xy / √ n ≤ β − / ≤ β − , where the second inequality comes from the facts that | e x ij | ≤ M xy , | e y ij | ≤ M xy and |N i ∪ e N i | ≤ D n .By Lemma A.5 in [16], we have for every 1 ≤ j, k, l ≤ p, | ∂ j ∂ k m ( z ) | ≤ U jk ( z ) , | ∂ j ∂ k ∂ l m ( z ) | ≤ U jkl ( z ) , where U jk ( z ) and U jkl ( z ) satisfy that p X j,k =1 U jk ( z ) ≤ ( G + 2 G β ) , p X j,k,l =1 U jkl ( z ) ≤ ( G + 6 G β + 6 G β ) , with G k = sup z ∈ R | ∂ k g ( z ) /∂z k | for k ≥
0. Along with Lemma A.6 in [16], we obtain | I | ≤ n X i =1 p X k,j =1 Z E [ U jk ( Z ( i ) ( t ))] | E [ ˙ Z ij ( t ) V ( i ) k ( t )] | dt . n X i =1 p X k,j =1 Z E [ U jk ( Z ( t ))] | E [ ˙ Z ij ( t ) V ( i ) k ( t )] | dt . ( G + G β ) Z max ≤ j,k ≤ p n X i =1 | E [ ˙ Z ij ( t ) V ( i ) k ( t )] | dt. OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Since 2 √ βD n M xy / √ n ≤
1, we have | I | ≤ n X i =1 p X k,j,l =1 Z Z E [ | ∂ k ∂ j ∂ l m ( Z ( i ) ( t ) + τ V ( i ) ( t )) | · | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | ] dtdτ ≤ n X i =1 p X k,j,l =1 Z Z E [ U kjl ( Z ( i ) ( t ) + τ V ( i ) ( t )) | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | ] dtdτ . n X i =1 p X k,j,l =1 Z E [ U kjl ( Z ( t )) | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | ] dtdτ ≤ Z E p X k,j,l =1 U kjl ( Z ( t )) max ≤ k,j,l ≤ p n X i =1 | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | dtdτ . ( G + G β + G β ) Z E max ≤ k,j,l ≤ p n X i =1 | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | dtdτ. (S.3)To bound the integration on (S.3), we let w ( t ) = 1 / ( √ t ∧ √ − t ) and note that Z E max ≤ k,j,l ≤ p n X i =1 | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | dt ≤ Z E max ≤ k,j,l ≤ p n X i =1 | ˙ Z ij ( t ) | ! / n X i =1 | V ( i ) k ( t ) | ! / n X i =1 |V ( i ) l ( t ) | ! / dt ≤ Z w ( t ) E max ≤ j ≤ p n X i =1 | ˙ Z ij ( t ) /w ( t ) | E max ≤ k ≤ p n X i =1 | V ( i ) k ( t ) | E max ≤ l ≤ p n X i =1 |V ( i ) l ( t ) | ! / dt. As for I , by the assumption that E y ij y lk = E x ij x lk (in fact, we only need to require that P k ∈ e N i E x i x ′ k = P k ∈ e N i E y i y ′ k for all i ), we havemax ≤ j,k ≤ p n X i =1 | E [ ˙ Z ij ( t ) V ( i ) k ( t )] | = max ≤ j,k ≤ p n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈ e N i ( E e x ij e x lk − E e y ij e y lk ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max ≤ j,k ≤ p n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈ e N i ( E e x ij e x lk − E x ij x lk ) + X l ∈ e N i ( E y ij y lk − E e y ij e y lk ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ j,k ≤ p n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈ e N i { E y lk ( y ij − e y ij ) + E e y ij ( y lk − e y lk ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ j,k ≤ p n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X l ∈ e N i { E x lk ( x ij − e x ij ) + E e x ij ( x lk − e x lk ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ φ ( M x , M y ) . (S.4)Using similar arguments as above, we have | I | . ( G + G β + G β ) I with I ≤ Z w ( t ) E max ≤ j ≤ p n X i =1 | ˙ Z ij ( t ) /w ( t ) | E max ≤ k ≤ p n X i =1 | V ( i ) k ( t ) | E max ≤ l ≤ p n X i =1 | V ( i ) l ( t ) | ! / dt. X. ZHANG AND G. CHENG
We first consider the term E max ≤ j ≤ p P ni =1 | ˙ Z ij ( t ) /w ( t ) | . Using the fact that | ˙ Z ij ( t ) /w ( t ) | ≤ ( | e x ij | + | e y ij | ) / √ n, we get E max ≤ j ≤ p n X i =1 | ˙ Z ij ( t ) /w ( t ) | . n / E max ≤ j ≤ p n X i =1 ( | e x ij | + | e y ij | ) . √ n ( m x, + m y, ) . On the other hand, notice that E max ≤ k ≤ p n X i =1 | V ( i ) k ( t ) | ≤ D n E max ≤ k ≤ p n X i =1 X j ∈ e N i | Z jk ( t ) | . D n n / E max ≤ k ≤ p n X i =1 X j ∈ e N i ( | e x jk | + | e y jk | ) . D n √ n ( m x, + m y, ) . Similarly, we have E max ≤ l ≤ p n X i =1 |V ( i ) l ( t ) | ≤ D n E max ≤ l ≤ p n X i =1 X j ∈N i | Z jl ( t ) | ≤ D n n / E max ≤ l ≤ p n X i =1 X j ∈N i ( | e x jl | + | e y jl | ) . D n √ n ( m x, + m y, ) . Note that R w ( t ) dt . . Summarizing the above results, we have I . ( G + G β ) φ ( M x , M y ) + ( G + G β + G β ) D n √ n ( m x, + m y, ) ,I . ( G + G β + G β ) D n √ n ( m x, + m y, ) . Alternatively, we can bound I in the following way. By Lemmas A.5 and A.6 in [16], we have | I | = n X i =1 p X k,l,j =1 Z Z (1 − τ ) E [ ∂ l ∂ k ∂ j m ( Z ( i ) ( t ) + τ V ( i ) ( t )) ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t )] dtdτ . n X i =1 p X k,j,l =1 Z E [ U kjl ( Z ( i ) ( t ))] E | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | dt . n X i =1 p X k,j,l =1 Z E [ U kjl ( Z ( t ))] E | ˙ Z ij ( t ) V ( i ) k ( t ) V ( i ) l ( t ) | dt ≤ n ( G + G β + G β ) Z w ( t ) max ≤ j,k,l ≤ p (¯ E | ˙ Z ij ( t ) /w ( t ) | ) / (¯ E | V ( i ) k ( t ) | ) / (¯ E | V ( i ) l ( t ) | ) / dt. Notice that max ≤ j ≤ p ¯ E | ˙ Z ij ( t ) /w ( t ) | ≤ n / max ≤ j ≤ p ¯ E ( | e x ij | + | e y ij | ) . n / ( ¯ m x, + ¯ m y, ) . It is not hard to see thatmax ≤ k ≤ p ¯ E | V ( i ) k ( t ) | ≤ D n max ≤ k ≤ p ¯ E X j ∈ e N i | Z jk ( t ) | . D n n / ( ¯ m x, + ¯ m y, ) . OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Thus we derive that I . ( G + G β + G β ) D n √ n ( ¯ m x, + ¯ m y, ) . Therefore, we obtain | E [ m ( e X ) − m ( e Y )] | . ( G + G β ) φ ( M x , M y ) + ( G + G β + G β ) D n √ n ( m x, + m y, )+ ( G + G β + G β ) D n √ n ( ¯ m x, + ¯ m y, ) . (S.5)Using the above arguments, we can show that I . ( G + G β + G β ) D n √ n ( ¯ m x, + ¯ m y, ) , (S.6)provided that 2 √ βD n M xy / √ n ≤
1. This proves the last statement of Proposition 3.1.Note that | m ( x ) − m ( y ) | ≤ G and | m ( x ) − m ( y ) | ≤ G max ≤ j ≤ p | x j − y j | with x = ( x , . . . , x p ) ′ and y = ( y , . . . , y p ) ′ . So | E [ m ( X ) − m ( e X )] | ≤| E [( m ( X ) − m ( e X )) I ] | + | E [( m ( X ) − m ( e X ))(1 − I )] | . G ∆ + G E [1 − I ] , | E [ m ( Y ) − m ( e Y )] | . G ∆ + G E [1 − I ] . (S.7)The conclusion follows by combining (S.5), (S.6) and (S.7). ♦ Proof of Corollary 3.1.
Notice that D n = 2 M + 1, | e N i | ≤ M + 1 and |N i ∪ e N i | ≤ M + 1.Define the N i = { j : { j, k } ∈ E n for some k ∈ N i } . Then | N i ∪ N i ∪ e N i | ≤ M + 1 . Following thearguments in the proof of Proposition 3.1, we can show thatmax ≤ l ≤ p ¯ E |V ( i ) l ( t ) | . D n n / ( ¯ m x, + ¯ m y, ) , which implies that I . ( G + G β + G β ) D n √ n ( ¯ m x, + ¯ m y, ) . The conclusion follows from the proof of Proposition 3.1. ♦ Proof of Lemma 3.1.
We only need to prove the result for x > x <
1. Suppose that the distributions of A i and B i are both symmetric, then we have P n X i =1 x ij > xV nj ! ≤ P r X i =1 ( A ij + B ij ) > xV nj ! ≤ P r X i =1 A ij > xV nj / ! + P r X i =1 B ij > xV nj / ! ≤ P r X i =1 A ij > xV nj / ! + P r X i =1 B ij > xV nj / ! ≤ − x / , X. ZHANG AND G. CHENG where we have used Theorem 2.15 in [17].Let { ξ ij } ni =1 be an independent copy of { x ij } ni =1 in the sense that { ξ ij } ni =1 have the same jointdistribution as that for { x ij } ni =1 , and define V ′ nj ( A ′ ij and B ′ ij ) in the same way as V nj ( A ij and B ij )by replacing { x ij } ni =1 with { ξ ij } ni =1 . Following the arguments in the proof of Theorem 2.16 in [17],we deduce that for x > ( n X i =1 x ij > x ( a j + b j + V nj ) , n X i =1 ξ ij ≤ a j , V ′ nj ≤ b j ) ⊂ ( n X i =1 ( x ij − ξ ij ) ≥ x ( a j + b j + V nj ) − a j , V ′ nj ≤ b j ) ⊂ ( n X i =1 ( x ij − ξ ij ) ≥ x ( a j + b j + V ∗ nj − V ′ nj ) − a j , V ′ nj ≤ b j ) ⊂ ( n X i =1 ( x ij − ξ ij ) ≥ xV ∗ nj ) , where we have used the fact that V ∗ nj ≡ vuut r X l =1 ( A lj − A ′ lj ) + r X l =1 ( B lj − B ′ lj ) ≤ V nj + V ′ nj . We note that A lj − A ′ lj and B lj − B ′ lj are symmetric, and P n X i =1 ξ ij ≤ a j , V ′ nj ≤ b j ! ≥ / . Thus we obtain P n X i =1 x ij ≥ x ( a j + b j + V nj ) ! = P ( P ni =1 x ij ≥ x ( a j + b j + V nj ) , P ni =1 ξ ij ≤ a j , V ′ nj ≤ b j ) P ( P ni =1 ξ ij ≤ a j , V ′ nj ≤ b j ) ≤ P n X i =1 x ij ≥ x ( a j + b j + V nj ) , n X i =1 ξ ij ≤ a j , V ′ nj ≤ b j ! ≤ P n X i =1 ( x ij − ξ ij ) ≥ xV ∗ nj ! ≤ − x / . Hence we get P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 x ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ x ( a j + b j + V nj ) ! ≤ − x / . In particular, we can choose b j = 4 E V nj and a j = 2 b j = 8 E V nj because 4 E ( P ni =1 x ij ) ≤ E ( P rj =1 A j ) + 8 E ( P rj =1 B j ) = 8 E V nj . ♦ OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Proof of Theorem 3.1.
Note that E [1 − I ] ≤ P ( max ≤ j ≤ p | X j − e X j | > ∆) + P ( max ≤ j ≤ p | Y j − e Y j | > ∆) ≤ p X j =1 n P ( | X j − e X j | > ∆) + P ( | Y j − e Y j | > ∆) o . Let Λ j ≡ (2 + 2 √ vuut r X i =1 E ( A ij − e A ij ) /n + r X j =1 E ( B ij − e B ij ) /n + vuut r X i =1 ( A ij − e A ij ) /n + r X i =1 ( B ij − e B ij ) /n = Λ j + Λ j , where e A ij = iN +( i − M X l =( i − N + M )+1 e x lj , e B ij = i ( N + M ) X l = iN +( i − M +1 e x lj . Applying Lemma 3.1 and using the union bound, we have with probability at least 1 − γ , | X j − e X j | ≤ Λ j p p/γ ) , ≤ j ≤ p. By the assumption, P (max ≤ i ≤ max ≤ j ≤ p | x ij | ≤ M x ) ≥ − γ, P (max ≤ i ≤ max ≤ j ≤ p | y ij | ≤ M y ) ≥ − γ. Therefore with probability at least 1 − γ, Λ j ≤ (2 + 2 √ vuut r X i =1 E ( A ij − ˘ A ij ) /n + r X j =1 E ( B ij − ˘ B ij ) /n + vuut r X i =1 ( E ˘ A ij ) /n + r X i =1 ( E ˘ B ij ) /n, ≤ (3 + 2 √ ϕ ( M x ) q N rσ j /n + M rσ j /n . ϕ ( M x ) σ j , where we have used the fact that E A ij = E B ij = 0 and the Cauchy-Schwarz inequality. The sameargument applies to the Gaussian sequence { y i } .Summarizing the above results and along with (10), we deduce that | E [ m ( X ) − m ( Y )] | . ( G + G β ) φ ( M x , M y ) + ( G + G β + G β ) (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ G ϕ ( M x , M y ) σ j p p/γ ) + G γ, (S.8) X. ZHANG AND G. CHENG which also implies that | E [ g ( T X ) − g ( T Y )] | . ( G + G β ) φ ( M x , M y ) + ( G + G β + G β ) (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ G ϕ ( M xy ) σ j p p/γ ) + G γ + β − G log p, (S.9)for M -dependent sequence, provided that 2 √ β (6 M + 1) M xy / √ n <
1. Consider a “smooth” indi-cator function g ∈ C ( R ) : R → [0 ,
1] such that g ( s ) = 1 for s ≤ g ( s ) = 0 for s ≥ . Fixany t ∈ R and define g ( s ) = g ( ψ ( s − t − e β )) with e β = β − log p . The conclusion follows from theproof of Corollary F.1 in [16] and Lemma 2.1 in [15] regarding the anti-concentration property forGaussian distribution. We omit the details to conserve the space. ♦ Proof of Theorem 3.2.
Let ˘ x ij = x ij − e x ij . Define χ ( l +1) k = ( x (1+ l ) k ∧ M x ) ∨ ( − M x ) and χ ( l − l +1) k = ( x ( l − l ) k ∧ M x ) ∨ ( − M x ). Using the fact that x j and x ( l − l ) k are independent for any1 ≤ j, k ≤ p and E x ij = E ˘ x ij = 0, we obtain for l > | E ˘ x j x ( l +1) k | = | E ˘ x j ( x (1+ l ) k − x ( l − l ) k ) |≤ ( E ˘ x j ) / ( E | ( x (1+ l ) k − x ( l − l ) k ) | ) / ≤ ( E x j ) / ( E | ( x (1+ l ) k − x ( l − l ) k ) | ) / /M x . Using the fact that the map x → ( x ∧ M x ) ∨ ( − M x ) is lipschitz continuous, we deduce that | E x j ˘ x ( l +1) k | = | E x j { ˘ x (1+ l ) k − ˘ x ( l − l ) k − E ( χ (1+ l ) k − χ ( l − l ) k ) } I {| x ( l +1) k | > M x or | x ( l − l +1) k | > M x }| . ( E | x j | ) / ( E | (˘ x (1+ l ) k − ˘ x ( l − l ) k ) | + E | χ (1+ l ) k − χ ( l − l ) k | ) / ( P ( | x (1+ l ) k | > M x ) + P ( | x ( l − l +1) k | > M x )) / . ( E | x j | ) / ( E | ( x (1+ l ) k − x ( l − l ) k ) | ) / ( E | x (1+ l ) k | + E | x ( l − l ) k | ) / /M x . Note for l = 0 , | E ˘ x j x ( l +1) k | ≤ ( E x j ) / ( E x k ) / /M x . It is not hard to show that the above resultholds if x j (or x ( l +1) k ) is replaced by its e x j (or e x ( l +1) k ). Therefore by (S.4) and the assumptions,we have max ≤ j,k ≤ p n X i =1 | E [ ˙ Z ij ( t ) V ( i ) k ( t )] | . (1 /M x + 1 /M y ) . Thus we may set φ ( M x , M y ) = C (1 /M x + 1 /M y ) for some constant C > . OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Next we consider ϕ ( M x , M y ). By the stationarity, we have N − X l =1 − N | E ˘ x k ˘ x (1+ l ) k | =2 N − X l =1 | E ˘ x k { ˘ x (1+ l ) k − ˘ x ( l − l ) k − E ( χ (1+ l ) k − χ ( l − l ) k ) } I {| x (1+ l ) k | > M x or | x ( l − l ) k | > M x }| + E | ˘ x k | . N − X l =1 ( E | ˘ x k | ) / ( E | ( x (1+ l ) k − x ( l − l ) k ) | + E | χ (1+ l ) k − χ ( l − l ) k | ) / ( P ( | x (1+ l ) k | > M x ) + P ( | x ( l − l ) k | > M x )) / + E | ˘ x k | . N − X l =1 ( E | x k | /M x ) / ( E | ( x (1+ h ) k − x ( h − h ) k ) | ) / ( E | x (1+ l ) k | /M x + E | x ( h − l ) k | /M x ) / + E | x k | /M x . /M / x . (S.10)Also note that ( E ˘ A ij ) /N = N ( E χ j ) = N { E ( χ j − x j ) } ≤ N ( E x j /M x ) and ( E ˘ B ij ) /M ≤ M ( E x j /M x ) . Because E ( A ij − e A ij ) /N . /M / x and E ( B ij − e B ij ) /M . /M / x by (S.10),we can choose ϕ ( M x ) = C ′ (1 /M / x + √ N /M x ) for some constant C ′ >
0. By the assump-tion that max ≤ k ≤ p E |G k ( . . . , ǫ i − , ǫ i ) | < ∞ and the fact that E y ij = E x ij , we have E | x ij | ≤ ( E |G j ( . . . , ǫ i − , ǫ i ) | ) / , E | y ij | ≤ ( E | y ij | ) / . ( E | y ij | ) / = ( E | x ij | ) / ≤ E | x ij | < ∞ , and E | y ij | . ( E | y ij | ) = ( E | x ij | ) ≤ E | x ij | < ∞ . Using similar arguments, we can show that ϕ ( M y ) = C ′′ (1 /M / y + √ N /M y ) for some constant C ′′ >
0. The above argument also implies that ¯ m x, + ¯ m y, < ∞ . Thus we ignore the constants andset ψ = O ( n / M − / l − / n ) and M x = M y = u = O ( n / M − / l − / n ) . Let 2 √ β (6 M + 1) M xy / √ n = 1, that is β = O ( √ n/ ( uM )) . It is straightforward to check thefollowing: ( ψ + ψβ ) φ ( M x , M y ) . ψ /u + ψ √ n/ ( u M ) . n − / M / l / n , ( ψ + ψ β + ψβ ) (2 M + 1) √ n . ψ M √ n + ψ Mu + ψ √ nu . n − / M / l / n ,ψϕ ( M x , M y ) σ j p p/γ ) . ψl / n u / + √ N ψl / n u . n − / M / l / n , ( e β + ψ − ) p ∨ log( pψ ) . l / n M u √ n + ψ − l / n . n − / M / l / n . Therefore we get ρ n := sup t ∈ R | P ( T X ≤ t ) − P ( T Y ≤ t ) | . n − / M / l / n + γ. (S.11) X. ZHANG AND G. CHENG
Under Condition (i) in Assumption 3.1, E h (max ≤ j ≤ p | x ij | / D n ) ≤
1. By Lemma 2.2 in [16], wehave u x ( γ ) . max { D n h − ( n/γ ) , l / n } and u y ( γ ) . l / n . Because n / M − / l − / n ≥ C max { D n h − ( n/γ ) , l / n } ,we can always choose u = O ( n / M − / l − / n ) such that(S.12) P ( max ≤ i ≤ n max ≤ j ≤ p | x ij | ≤ u ) ≥ − γ, P ( max ≤ i ≤ n max ≤ j ≤ p | y ij | ≤ u ) ≥ − γ. Using similar arguments, we can prove the result under Condition (ii) in Assumption 3.1. The proofis thus completed. ♦ The following lemma verifies condition (18).
Lemma
S.1 . Assume that max ≤ k ≤ p P + ∞ j =1 jθ j,k, ( x ) < ∞ . Then sup M max ≤ k ≤ p M X l =1 ( E | x ( M )(1+ l ) k − x ( l − l ) k | ) / ≤ max ≤ k ≤ p + ∞ X j =1 jθ j,k, ( x ) < ∞ . Proof of Lemma S.1.
Define the projection P j x ik = E [ x ik |F j ( i )] − E [ x ik |F j − ( i )]. Then wehave x ( M )(1+ l ) k − x ( l − l ) k = E [ G k ( . . . , ǫ l , ǫ l +1 ) |F M ( l + 1)] − E [ G k ( . . . , ǫ l , ǫ l +1 ) |F l − ( l + 1)] = M X j = l P j x ( l +1) k . Note that P j x ik = E [ x ik |F j ( i )] − E [ x ik |F j − ( i )]= E [ G k ( . . . , ǫ i − , ǫ i ) − G k ( . . . , ǫ ′ i − j , ǫ i − j +1 , . . . , ǫ i − , ǫ i ) |F j ( i )]= E [ G k ( . . . , ǫ j − , ǫ j ) − G k ( . . . , ǫ ′ , ǫ , . . . , ǫ j − , ǫ j ) |F j ( j )] . Jensen’s inequality yields that ( E |P j x ik | q ) /q ≤ θ j,k,q ( x ) which implies that( E | x ( M )(1+ l ) k − x ( l − l ) k | ) / ≤ M X j = l ( E |P j x ( l +1) k | ) / ≤ M X j = l θ j,k, ( x ) . Therefore, we obtainsup M max ≤ k ≤ p M X l =1 ( E | x ( M )(1+ l ) k − x ( l − l ) k | ) / ≤ sup M max ≤ k ≤ p M X l =1 M X j = l θ j,k, ( x ) ≤ max ≤ k ≤ p + ∞ X j =1 jθ j,k, ( x ) < ∞ . ♦ Proof of Theorem 3.3.
We need to verify that the M -dependent approximation { x ( M ) i } sat-isfies the assumptions in Theorem 3.2. Using the convexity of h ( · ) and Jensen’s inequality we have E h ( max ≤ j ≤ p | x ( M ) ij | / D n ) ≤ E h ( max ≤ j ≤ p | x ij | / D n ) ≤ , OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES under Condition (i) in Assumption 3.1, andmax ≤ j ≤ p E exp( | x ( M ) ij | / D n ) ≤ max ≤ j ≤ p E exp( | x ij | / D n ) ≤ , under Condition (ii) in Assumption 3.1.We claim that as M → + ∞ , (S.13) sup p max ≤ j ≤ p + ∞ X h = −∞ | E x ( M ) ij x ( M )( i + h ) j − E x ij x ( i + h ) j | → , which implies that max ≤ j ≤ p | σ ( M,n ) j,j − σ ( n ) j,j | → ≤ j ≤ p | ( σ ( M ) j ) − σ j | → σ ( M,n ) j,j = P n − h =1 − n ( n −| h | ) E x ( M ) ij x ( M )( i + h ) j /n and ( σ ( M ) j ) = P + ∞ h = −∞ | E x ( M ) ij x ( M )( i + h ) j | . Thus under the assumptionsin Theorem 3.3, we have c / < min ≤ j ≤ p σ ( M,n ) j,j ≤ max ≤ j ≤ p σ ( M,n ) j,j < c for some constants0 < c < c uniformly for all large enough M .To show (S.13), we note that + ∞ X h = −∞ | E x ( M ) ij x ( M )( i + h ) j − E x ij x ( i + h ) j | = M X h = − M | E x ( M ) ij x ( M )( i + h ) j − E x ij x ( i + h ) j | + X | h | >M | E x ij x ( i + h ) j | = I j ( M ) + I j ( M ) . For the first term, we have I j ( M ) ≤ M X h = − M | E x ( M ) ij ( x ( M )( i + h ) j − x ( i + h ) j ) | + M X h = − M | E ( x ( M ) ij − E x ij ) x ( i + h ) j |≤ M X h = − M { E ( x ( M ) ij ) } / { E ( x ( M )( i + h ) j − x ( i + h ) j ) } / + M X h = − M { E ( x ( M ) ij − E x ij ) } / { E ( x ( i + h ) j ) } / . M { E ( x j ) } / { E ( x ( M )1 j − x j ) } / ≤ M { E ( x j ) } / ∞ X l = M +1 ( E |P l x j | ) / ≤{ E ( x j ) } / ∞ X l = M +1 lθ l,j, ( x ) ≤ { E ( x j ) } / ∞ X l = M +1 lθ l,j, ( x ) , where we have used the fact that x j − x ( M )1 j = P + ∞ l = M +1 P l x j and ( E |P l x j | q ) /q ≤ θ l,j,q ( x ). Underthe assumption that P + ∞ j =1 max ≤ k ≤ p jθ j,k, ( x ) ≤ P + ∞ j =1 ℓ j < ∞ , we have max ≤ j ≤ p I j ( M ) → M → + ∞ . On the other hand, note that for h > M E x ij x ( i + h ) j = E x ij ( x ( i + h ) j − x ( h − i + h ) j ) ≤ ( E x ij ) / { E ( x ( i + h ) j − x ( h − i + h ) j ) } / . Thus we havemax ≤ j ≤ p I j ( M ) . max ≤ j ≤ p X h>M X l ≥ h θ l,j, ( x ) ≤ max ≤ j ≤ p + ∞ X l = M +1 lθ l,j, ( x ) ≤ + ∞ X l = M +1 ℓ l , X. ZHANG AND G. CHENG which implies that max ≤ j ≤ p I j ( M ) → M → + ∞ . Lemma S.1 verifies the first condition in (17). The same arguments apply to { y i } . The triangleinequality and (24) imply that | E [ m ( X ) − m ( Y )] | . | E [ m ( X ( M ) ) − m ( Y ( M ) )] | + ( G G q ) / (1+ q ) p X j =1 Θ qM,j,q / (1+ q ) , where Y ( M ) = P ni =1 y ( M ) i / √ n with y ( M ) i being the M -dependent approximation for { y i } . Theconclusion thus follows from Theorem 3.1 and Theorem 3.2. ♦ Proof of Proposition 3.2.
Without loss of generality, we assume that π ( i ) = i. Define twoevents D x = { max ≤ j ≤ q X j > max q +1 ≤ j ≤ p X j } and D y = { max ≤ j ≤ q Y j > max q +1 ≤ j ≤ p Y j } . Simplealgebra yields that uniformly for all z ∈ R , (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ p X j ≤ z (cid:19) − P (cid:18) max q +1 ≤ j ≤ p X j ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ p X j ≤ z, D x (cid:19) + P (cid:18) max ≤ j ≤ p X j ≤ z, D cx (cid:19) − P (cid:18) max q +1 ≤ j ≤ p X j ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ q X j ≤ z, D x (cid:19) + P (cid:18) max q +1 ≤ j ≤ p X j ≤ z, D cx (cid:19) − P (cid:18) max q +1 ≤ j ≤ p X j ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ q X j ≤ z, D x (cid:19) − P (cid:18) max q +1 ≤ j ≤ p X j ≤ z, D x (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P ( D x ) . Next we analyze P ( D x ) and P ( D y ). Under the assumptions in Corollary 2.1 of [16], we havesup z ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max q +1 ≤ i ≤ p X i ≤ z (cid:19) − P (cid:18) max q +1 ≤ i ≤ p Y i ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . n − c , c > . (S.14)Notice that in this case, we allow p = O (exp( n b )) with b < / B n = O (1) inCorollary 2.1 of [16]). By (S.14), and the independence between { z i } and { z i } , we obtain P ( D x ) ≤ q X j =1 E (cid:20) P (cid:18) max q +1 ≤ i ≤ p X i < X j (cid:12)(cid:12)(cid:12)(cid:12) X j (cid:19)(cid:21) . q X j =1 E (cid:20) P y (cid:18) max q +1 ≤ i ≤ p Y i < X j (cid:19)(cid:21) + qn − c , where P y denotes the probability measure with respect to ( Y q +1 , . . . , Y p ).Let ¯ σ = max ≤ j ≤ p σ j,j . Using the concentration inequality (see e.g. (7.3) of [25] and TheoremA.2.1 of [38]), P (cid:18) max q +1 ≤ i ≤ p Y i ≤ E max q +1 ≤ i ≤ p Y i − r (cid:19) ≤ e − r / (2¯ σ ) , for r >
0, we have P (cid:18) max q +1 ≤ i ≤ p Y i < x (cid:19) ≤ exp − σ (cid:18) E max q +1 ≤ i ≤ p Y i − x (cid:19) ! , OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES where x + = x I { x ≥ } . Under the assumption that q/ E max q +1 ≤ i ≤ p Y i → , we can choose e q → + ∞ such that e q/ E max q +1 ≤ i ≤ p Y i → q/ e q →
0. Then we have q X j =1 E (cid:20) P y (cid:18) max q +1 ≤ i ≤ p Y i < X j (cid:19)(cid:21) ≤ q X j =1 E exp − σ (cid:18) E max q +1 ≤ i ≤ p Y i − X j (cid:19) ! ≤ q X j =1 E exp − σ (cid:18) E max q +1 ≤ i ≤ p Y i − X j (cid:19) ! I { X j ≤ e q } + q X j =1 E I { X j > e q }≤ exp log q − σ (cid:18) E max q +1 ≤ i ≤ p Y i − e q (cid:19) ! + q max ≤ j ≤ q E | X j | / e q = o (1) . Moreover, if q/ E max q +1 ≤ j ≤ p Y j = O ( n − c ′ ) for c ′ >
0, we can replace o (1) by O ( n − c ′′ ) for some c ′′ > . Thus we get sup z ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) max ≤ j ≤ p X j ≤ z (cid:19) − P (cid:18) max q +1 ≤ j ≤ p X j ≤ z (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P ( D x ) . q X j =1 E (cid:20) P y (cid:18) max q +1 ≤ i ≤ p Y i < X j (cid:19)(cid:21) + qn − c . n − c ′′ . Similar argument applies to { Y i } and the conclusion follows from (S.14). ♦ S.2.
Proofs of the main results in Section 4.
Proof of Lemma 4.1.
By the triangle inequality and the stationarity, we have E A ≤ max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 ( A ij A ik − E A ij A ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 E A ij A ik − σ j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ j,k ≤ p | σ j,k − σ ( n ) j,k |≤ max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 ( A ij A ik − E A ij A ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X | l |≥ N E x i + l,j x i,k + 1 N N − X l =1 − N | l | E x i + l,j x i,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ j,k ≤ p | σ j,k − σ ( n ) j,k |≤ max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 ( A ij A ik − E A ij A ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 4 N max ≤ j,k ≤ p + ∞ X l = −∞ | l || E x i + l,j x i,k | . Note that for any 1 ≤ j, k ≤ p, { A ij A ik } is a sequence of i.i.d random variables. Let σ A,N =max ≤ j,k ≤ p E ( A ij A ik ) /N and M A,N = max ≤ i ≤ r max ≤ j ≤ p | A ij / √ N | . Then by Lemma A.1 in[16], we have E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 ( A ij A ik − E A ij A ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . σ A,N p p/r + 2 log p p E M A,N /r. X. ZHANG AND G. CHENG
Cauchy-Schwarz inequality yields that σ A,N ≤ N max ≤ j ≤ p E ( A ij ) ≤ N max ≤ j ≤ p E N X i ,i ,i ,i =1 x i j x i j x i j x i j ≤ N max ≤ j ≤ p N X i ,i ,i ,i =1 (cid:26) cum( x i j , x i j , x i j , x i j ) + γ x,jj ( i − i ) γ x,jj ( i − i )+ γ x,jj ( i − i ) γ x,jj ( i − i ) + γ x,jj ( i − i ) γ x,jj ( i − i ) (cid:27) ≤ max ≤ j ≤ p N + ∞ X i ,i ,i = −∞ | cum( x i j , x i j , x i j , x j ) | + 3 + ∞ X h = −∞ | γ x,jj ( h ) | ! . ¯ σ x,N . On the other hand, with h ( x ) = exp( x ) −
1, we have (cid:18) E max ≤ i ≤ r max ≤ j ≤ p | A ij / √ N | (cid:19) / . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ≤ i ≤ r max ≤ j ≤ p | A ij / √ N | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h . log( rp ) max ≤ i ≤ r max ≤ j ≤ p || A ij / √ N || h = log( rp ) max ≤ j ≤ p || A j / √ N || h , (S.15)where we have used Lemma 2.2.2 in [38]. It implies that p E M A,N ≤ { log( rp ) } max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 x ij / √ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h . Combining the above arguments, we deduce that E E A . ¯ σ x,N p log p/r + log p { log( rp ) } ζ x,h,N /r + ̟ x /N, E E B . ¯ σ x,M p log p/r + log p { log( rp ) } ζ x,h,M /r + ̟ x /M. Alternatively, note that (cid:16) E max ≤ i ≤ r max ≤ j ≤ p | A ij / √ N | (cid:17) / ≤ r / ς x,N . The conclusion followsfrom the above arguments. ♦ Proof of Theorem 4.1.
By Theorem 3.2, ρ n . n − / M / l / n + γ. Choosing γ = O ( n − c ′ ) forsome c ′ > (1 − b ′ − b ) / , we have ρ n = O ( n − (1 − b ′ − b ) / ). Pick ν = O ( n − v ) with v = 3((1 − b − b ′′ ) / − s ) ∧ (1 − b − b ′′ − s ) ∧ ( b ′ − b − s ) / b. Then it is easy to verify that the terms ν / (1 ∨ log( p/ν )) / and E E A /ν + E E B /ν are both of order O ( n − c ′′ ) with c ′′ = s b /
4. Finally by (39), we havesup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . n − c , c = min { s b / , (1 − b ′ − b ) / } . The result under Condition 2 can be proved in a similar manner. ♦ OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Proof of Theorem 4.2.
Let x ( M ) i be the M -dependent approximation sequence for x i . Define A ( M ) ij , B ( M ) ij , E ( M ) A and E ( M ) B in a similar way as A ij , B ij , E A and E B by replacing x i with x ( M ) i .Notice that E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r r X i =1 ( A ( M ) ij A ( M ) ik − A ij A ik ) /N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ rN X ≤ j,k ≤ p E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 ( A ( M ) ij A ( M ) ik − A ij A ( M ) ik + A ij A ( M ) ik − A ij A ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N X ≤ j,k ≤ p (cid:16) E (cid:12)(cid:12)(cid:12) A ( M )1 j A ( M )1 k − A j A ( M )1 k (cid:12)(cid:12)(cid:12) + E (cid:12)(cid:12)(cid:12) A j A ( M )1 k − A j A k (cid:12)(cid:12)(cid:12)(cid:17) ≤ N X ≤ j,k ≤ p ((cid:18) E (cid:12)(cid:12)(cid:12) A ( M )1 j − A j (cid:12)(cid:12)(cid:12) (cid:19) / ( E | A ( M )1 k | ) / + (cid:18) E (cid:12)(cid:12)(cid:12) A ( M )1 k − A k (cid:12)(cid:12)(cid:12) (cid:19) / ( E | A j | ) / ) . By Lemma A.1 of [29], we have ( E | A ( M )1 j − A j | ) / / √ N ≤ C q Θ M,j,q ( x ) for some q ≥
2. It followsthat E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r r X i =1 ( A ( M ) ij A ( M ) ik − A ij A ik ) /N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . p ρ M . Similarly we have E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r r X i =1 ( B ( M ) ij B ( M ) ik − B ij B ik ) /M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . p ρ M . Using similar arguments in the proof of Theorem 3.3, we havemax ≤ j,k ≤ p n − X h =1 − n (cid:12)(cid:12)(cid:12) E x ( M ) ij x ( M )( i + h ) k − E x ij x ( i + h ) k (cid:12)(cid:12)(cid:12) . M ρ M . Thus by (39), we havesup α ∈ (0 , | P ( T X ≤ c T D ( α )) − α | . ρ n + ν / (1 ∨ log( p/ν )) / + E E ( M ) A /ν + E E ( M ) B /ν + ( p + M ) ρ M /ν. Then by Lemma A.1 in [16], we have E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N r r X i =1 ( A ( M ) ij A ( M ) ik − E A ( M ) ij A ( M ) ik ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . N max ≤ j ≤ p { E ( A ( M ) ij ) } / p p/r + 2 log p r E max ≤ i ≤ r max ≤ j ≤ p | A ( M ) ij / √ N | /r . N max ≤ j ≤ p { E ( A ij ) } / p p/r + 2 log p r E max ≤ i ≤ r max ≤ j ≤ p | A ij / √ N | /r + p p/r max ≤ j ≤ p Θ M,j, ( x ) + 2 log p r rp max ≤ j ≤ p Θ M,j, ( x ) /r, X. ZHANG AND G. CHENG where the first two terms can be bounded using similar arguments in the proof of Lemma 4.1, andthe last two terms decay exponentially. The same arguments apply to the terms associated with B ij .By Theorem 3.3, we have ρ n . n − / M / l / n + γ + ( n / M − / l − / n ) q/ (1+ q ) p X j =1 Θ qM,j,q / (1+ q ) . The assumption that max ≤ j ≤ p Θ M,j,q = O ( ρ M ) for ρ <
1, and M = O ( n b ′ ) with b ′ > b impliesthat (cid:16)P pj =1 Θ qM,j,q (cid:17) / (1+ q ) decays exponentially. The rest of the proof is similar to those in theproof of Theorem 4.1. ♦ Proof of Theorem 4.3.
Our arguments below apply to M -dependent time series, and can beeasily extended to weakly dependent time series by employing the M -approximation techniques(that incurs only an asymptotically ignorable error).Let c , c ∗ , and C be some generic constants which can be different from line to line. Define˘ T e X = max ≤ j ≤ p √ n l n X i =1 ( A ij − ¯ A j ) e i . Following the arguments in the proof of Lemma 4.1, we have E max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l n l n X i =1 ( A ij / p b n ) − σ ( b n ) j,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . ¯ σ x,b n p log p/l n + log p { log( l n p ) } ζ x,h,b n /l n ≤ Cn − c . Similarly we can show that E max ≤ j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l n l n X i =1 A ij / p b n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . max ≤ j ≤ p σ j p log p/l n + log p { log( l n p ) } ζ x,h,b n /l n ≤ Cn − c where we have used the fact that(S.16) E max ≤ i ≤ l n max ≤ j ≤ p |A ij / p b n | . log( l n p ) ζ x,h,b n . By Markov’s inequality, we have with probability 1 − Cn − c , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n l n X i =1 ( A ij − ¯ A j ) − σ ( b n ) j,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( c / ∧ c , uniformly for 1 ≤ j ≤ p. It implies that with probability 1 − Cn − c , c / ≤ n P l n i =1 ( A ij − ¯ A j ) ≤ c . By (S.16), we have with probability with 1 − Cn − c , max ≤ i ≤ l n max ≤ j ≤ p |A ij / √ b n | ≤ n c ∗ log( l n p ) ζ x,h,b n for some small c ∗ > . Because ζ x,h,b n { log( pl n ) } /l n . n − c ′ , we can apply Corol-lary 2.1 in [16] to conclude that with probability 1 − Cn − c ,sup t ∈ R | P ( T X ∗ ≤ t |{ x i } ni =1 ) − P ( ˘ T e X ≤ t |{ x i } ni =1 ) | . n − c ′ , c ′ > . (S.17) OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Next, notice that | ˘ T e X − T e X | ≤ max ≤ j ≤ p | ¯ A j / p b n | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ l n l n X i =1 e i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . With probability 1 − Cn − c , we have | ˘ T e X − T e X | ≤ n c ∗ p log p/l n (cid:12)(cid:12)(cid:12) √ l n P l n i =1 e i (cid:12)(cid:12)(cid:12) . Using the tail propertyof standard normal distribution, we can choose ζ = n c ∗ p log p/l n such that with probability1 − o (1), P ( | ˘ T e X − T e X | > ζ |{ x i } ni =1 ) . n − c ′ , and √ log pζ . n − c ′ for some properly chosen c ∗ and c ′ . Therefore by Lemma 2.1 in [16], we obtainthat with probability 1 − Cn − c ,sup t ∈ R | P ( T e X ≤ t |{ x i } ni =1 ) − P ( ˘ T e X ≤ t |{ x i } ni =1 ) | . n − c ′ . (S.18)By (S.17) and (S.18), (42) holds with probability 1 − Cn − c . The second part of the theorem followsfrom Theorem 4.1 and Theorem 4.2. ♦ S.3.
Proofs of the main results in Section 5.
Proof of Theorem 5.1.
Define T D = max ≤ j ≤ q √ n P r i =1 D ij , where D ij = A ij e i + B ij e e i and A ij = iN +( i − M X l =( i − N + M )+1 x lj , B ij = i ( N + M ) X l = iN +( i − M +1 x lj , ≤ i ≤ r , ≤ j ≤ q . Since max ≤ j ≤ q (cid:12)(cid:12)(cid:12)P N i =1 IF ( v i , F d ) (cid:12)(cid:12)(cid:12) / √ N = max ≤ j ≤ q P N i =1 x i / √ N , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ≤ j ≤ q p N | b θ j − e θ j | − max ≤ j ≤ q N X i =1 x i / p N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ q p N |R jN | . Let ζ = Cn − c / p log(2 q ) and ζ = Cn − c for some large enough C and small enough c (e.g. c < c )such that P ( max ≤ j ≤ q p N |R jN | > ζ ) < ζ . We show that P ( P ( | T D − T b D | > ζ |{ x i } ni =1 ) > ζ ) ≤ ζ . Because | T D − T b D | ≤ max ≤ j ≤ q (cid:12)(cid:12)(cid:12) √ n P ri =1 ( D ij − b D ij ) (cid:12)(cid:12)(cid:12) and 1 √ n r X i =1 ( D ij − b D ij ) ∼ N , n r X i =1 { ( A ij − b A ij ) + ( B ij − b B ij ) } ! conditional on { x i } q i =1 , we have E [ | T D − T b D ||{ x i } ni =1 ] ≤ C ′ p E AB log(2 q ) for some large enoughconstant C ′ . It thus implies that P ( P ( | T D − T b D | > ζ |{ x i } ni =1 ) > ζ ) ≤ P ( E [ | T D − T b D ||{ x i } ni =1 ] > ζ ζ ) ≤ P ( E AB { C ′ log(2 q ) } > C n − c ) ≤ Cn − c , X. ZHANG AND G. CHENG for large enough C and sufficiently small c (e.g. c < c / H , sup α ∈ (0 , | P ( max ≤ j ≤ q p N | b θ j − e θ j | > c ( α )) − α | . n − e c + ζ p ∨ log(2 q /ζ ) + ζ . n − c ′′ , for c ′′ >
0, where e c = c or c ′ , which are defined in Theorem 4.2. ♦ Proof of Theorem 5.3.
Note that p N ( b Θ ∗ − b Θ) = 1 √ N N X i =1 ( IF ( v ∗ i , F d ) − N N X i =1 IF ( v i , F d ) ) + p N ( R ∗ N − R N ) , which implies that J ≡ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ≤ j ≤ q p N | b θ ∗ j − b θ j | − max ≤ j ≤ q √ N N X i =1 ( x ∗ ij − ¯ x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ q p N |R ∗ jN − R jN | , where ¯ x j = P N i =1 x ij /N . Denote by e c ( α ) the (1 − α ) quantile of the distribution of max ≤ j ≤ q { P N i =1 ( x ∗ ij − ¯ x j ) / √ N } conditional on the sample { u i } . Let ζ = Cn − c / p log(2 q ) and ζ = Cn − c for C > C i and c < c i with i = 1 , , ,
4. Assumption 5.2 and Lemma 3.3 of [16] imply that P ( P ( J > ζ |{ u i } ni =1 ) > ζ ) ≤ ζ , and thus P ( e c ( α ) ≤ c ( α + ζ ) + ζ ) ≥ − ζ ,P ( c ( α ) ≤ e c ( α + ζ ) + ζ ) ≥ − ζ . Then on the event { c ( α ) ≤ e c ( α + ζ )+ ζ }∪{ e c ( α − ζ ) ≤ c ( α )+ ζ }∪{ max ≤ j ≤ q √ N |R jN | ≤ ζ } ,we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P max ≤ j ≤ q N X i =1 x ij / p N ≤ e c ( α ) ! − P (cid:18) max ≤ j ≤ q p N | b θ j − e θ j | ≤ c ( α ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P e c ( α − ζ ) − ζ ≤ max ≤ j ≤ q N X i =1 x ij / p N ≤ e c ( α ) ! + P e c ( α ) ≤ max ≤ j ≤ q N X i =1 x ij / p N ≤ e c ( α + ζ ) + 2 ζ ! . The conclusion follows from similar arguments in the proof of Theorem 4.3. ♦ OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES S.4.
Technical details for Section 2.3.
To justify the validity of the procedure in Section 2.3, weimpose the following assumptions which are parallel to those in Assumption 5.1.
Assumption
S.1 . Assume that under H , P max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n X i =1 p γ u,jj (0) γ u,kk (0) − pb γ u,jj (0) b γ u,kk (0) p γ u,jj (0) γ u,kk (0) b γ u,jj (0) b γ u,kk (0) u ij u ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > C n − c / p log( p ) ! < C n − c and P ( E AB { log( p ) } > C n − c ) ≤ C n − c , where c , C , c , C > , and E AB = max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 { ( e A i,jk − b A i,jk ) + ( e B i,jk − b B i,jk ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , with e A i,jk = P iN +( i − Ml = iN +( i − M − N +1 e u lj e u lk and e B i,jk = P i ( N + M ) l = i ( N + M ) − M +1 e u lj e u lk . Below we provide some primitive conditions under which Assumption S.1 holds. To this end, weconsider a M -dependent stationary sequence { x i } , where M is allowed to grow with the samplesize. Lemma
S.2 . Assumption S.1 holds under the following conditions, c < min j γ u,jj (0) ≤ max j γ u,jj (0) < C , c , C > , max ≤ j,k ≤ p { E ( A ,jk / √ N ) } / ∨ { E ( B ,jk / √ M ) } / . n s , max ≤ j,k ≤ p || A ,jk / √ N || h ∨ || B ,jk / √ M || h . n s ,n s p log( p ) / ( rM ) + n s log( rp ) log( p ) / ( r √ M ) . n − c ,N n − c (log p ) . n − c ′ , √ n log pn c/ . n − c ′′ ,n − c √ N (log p ) log( rp ) n s . n − c ′′′ . Proof of Lemma S.2.
Define the block sums A i,jk = P iN +( i − Ml = iN +( i − M − N +1 u lj u lk and B i,jk = P i ( N + M ) l = i ( N + M ) − M +1 u lj u lk . Note that P ( max ≤ j,k ≤ p | γ u,jk (0) − b γ u,jk (0) | > ) ≤ E max ≤ j,k ≤ p | γ u,jk (0) − b γ u,jk (0) | / ≤ E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 ( A i,jk − E A i,jk ) / ( N r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 1 E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 ( B i,jk − E B i,jk ) / ( M r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . X. ZHANG AND G. CHENG
By Lemma A.1 in [16] and the assumptions, E max ≤ j,k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 ( A i,jk − E A i,jk ) / ( N r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . max ≤ j,k ≤ p { E ( A ,jk /N ) } / p log( p ) /r + r E max ≤ i ≤ r max ≤ j,k ≤ p | A i,jk /N | log( p ) /r . max ≤ j,k ≤ p { E ( A ,jk / √ N ) } / p log( p ) / ( rN ) + max ≤ j,k ≤ p || A ,jk / √ N || h log( rp ) log( p ) / ( r √ N ) . n s p log( p ) / ( rN ) + n s log( rp ) log( p ) / ( r √ N ) . n − c . With = n − c/ , we have P (cid:18) max ≤ j,k ≤ p | γ u,jk (0) − b γ u,jk (0) | > n − c/ (cid:19) . n − c/ . On the event max ≤ j,k ≤ p | γ u,jk (0) − b γ u,jk (0) | ≤ n − c/ , we have c / ≤ b γ u,jj (0) ≤ C uniformlyfor 1 ≤ j ≤ p and some c , C > . Hence we get p γ u,jj (0) γ u,kk (0) − pb γ u,jj (0) b γ u,kk (0) p γ u,jj (0) γ u,kk (0) b γ u,jj (0) b γ u,kk (0) . q γ u,jj (0) γ u,kk (0) − qb γ u,jj (0) b γ u,kk (0) . max ≤ j ≤ p | γ u,jj (0) − b γ u,jj (0) || p γ u,jj (0) + pb γ u,jj (0) | . max ≤ j ≤ p | γ u,jj (0) − b γ u,jj (0) | . On the other hand, using similar arguments above, we have I = P max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n X i =1 p γ u,jj (0) γ u,kk (0) − pb γ u,jj (0) b γ u,kk (0) p γ u,jj (0) γ u,kk (0) b γ u,jj (0) b γ u,kk (0) u ij u ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > C n − c / p log( p ) ! . P max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n X i =1 u ij u ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ′ ! + n − c/ ≤ ′ E max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n X i =1 u ij u ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + n − c/ ≤ √ n ′ E max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 A i,jk / ( N r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + √ n ′ E max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 B i,jk / ( M r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + n − c/ , where ′ = Cn c/ − c / √ log p. Again by Lemma A.1 in [16], E max | j − k | >ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r X i =1 A i,jk / ( N r ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . max | j − k |≥ ι { E ( A ,jk /N ) } / p log( p ) /r + r E max ≤ i ≤ r max | j − k |≥ ι | A i,jk /N | log( p ) /r . n s p log( p ) / ( rN ) + n s log( rp ) log( p ) / ( r √ N ) . n − c , which implies that I . P max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n X i =1 u ij u ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ′ ! + n − c/ . √ n log pn c/ − c + n − c/ . n − c , OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES for properly chosen c . Next we show that P ( E AB { log( p ) } > C n − c ) ≤ C n − c .Let E A = max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 ( e A i,jk − b A i,jk ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , E B = max | j − k |≥ ι (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n r X i =1 ( e B i,jk − b B i,jk ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . We shall show that P ( E A { log( p ) } > C n − c ) ≤ C n − c . Similar arguments apply to E B . Note that P (cid:18) max ≤ i ≤ r max | j − k | >ι | A i,jk / √ N | > a (cid:19) ≤ a E max ≤ i ≤ r max | j − k | >ι | A i,jk / √ N | ≤ n s log( rp ) a , where the value of a will be determined later. On the events max ≤ i ≤ r max | j − k | >ι | A i,jk / √ N | < a and max ≤ j,k ≤ p | γ u,jk (0) − b γ u,jk (0) | ≤ n − c/ , we have1 n r X i =1 A i,jk b γ u,jj (0) b γ u,kk (0) ≤ n r X i =1 √ N a | A i,jk | b γ u,jj (0) b γ u,kk (0) ≤ n n X i =1 √ N a | u i,j u i,k | b γ u,jj (0) b γ u,kk (0) ≤ √ N a pb γ u,jj (0) b γ u,kk (0) . √ N a.
We also note that E A . max | j − k | >ι n r X i =1 ( A i,jk ( p γ u,jj (0) γ u,kk (0) − pb γ u,jj (0) b γ u,kk (0)) p γ u,jj (0) γ u,kk (0) b γ u,jj (0) b γ u,kk (0) ) + N r max | j − k | >ι | b γ jk | /n . n − c max | j − k | >ι n r X i =1 A i,jk b γ u,jj (0) b γ u,kk (0) + N n − c . n − c √ N a + N n − c . The conclusion therefore follows provided that a = n s log( rp ) n c ′′′ / , N n − c (log p ) . n − c ′ and n − c √ N (log p ) log( rp ) n s . n − c ′′′ for some c ′ , c ′′′ > . ♦ S.5.
General functions on vector sum.
In this section, we extend the results in Section 3.1 to gen-eral smooth functions L : R p → R on the high-dimensional vector sum. We impose the following as-sumption S.2 regarding the smoothness of L . Write ∂ j L ( x ) = ∂ L ( x ) /∂x j , ∂ jk L ( x ) = ∂ L ( x ) /∂x j ∂x k and ∂ jkl L ( x ) = ∂ L ( x ) /∂x j ∂x k ∂x l for j, k, l = 1 , , . . . , p , where x = ( x , x , . . . , x p ) ′ . Assumption
S.2 . Suppose that (S.19) p X j =1 | ∂ j L ( x ) | . L ( p ) , p X j,k =1 | ∂ jk L ( x ) | . L ( p ) , p X j,k,l =1 | ∂ jkl L ( x ) | . L ( p ) , where the constants L ( p ) , L ( p ) and L ( p ) do not depend on x . Further assume that for any ω = ( ω , . . . , ω p ) ′ ∈ R p with max ≤ j ≤ p | ω j | ∈ B p for some set B p ⊂ R , ∂ j L ( x ) . ∂ j L ( x + ω ) . ∂ j L ( x ) ,∂ jk L ( x ) . ∂ jk L ( x + ω ) . ∂ jk L ( x ) ,∂ jkl L ( x ) . ∂ jkl L ( x + ω ) . ∂ jkl L ( x ) , X. ZHANG AND G. CHENG where ≤ j, k, l ≤ p. Here, “ . ” means ≤ up to a universal constant. Example
S.1 . Consider L λ ( x ) = P pj =1 g j,λ ( x j ) /p , where x = ( x , . . . , x p ) ′ and λ is a threshold-ing parameter. Here we assume that g j,λ ( x ) = 0 for | x | < λ and g j,λ satisfies that P pj =1 | ∂g j,λ ( x ) /∂x | /p ≤ C for some constant C > . It is straightforward to verify that P pj =1 | ∂ j L λ ( x ) | /p ≤ C , ∂ j ∂ k L λ ( x ) =0 and ∂ j ∂ k ∂ l L λ ( x ) = 0 for ≤ j, k, l ≤ p. Note that with proper choice of g j,λ , L λ ( x ) provides asmooth approximation to the function P pj =1 | x j | {| x j | > λ } which serves as a building block for thehigher criticism test in [43]. Assumption S.2 generalizes the results in Lemmas A.5 and A.6 of [16]. Consider the dependencygraph in Section 3.1. Parallel to Proposition 3.1, we have the following result. With slightly abuseof notation, set m = g ◦ L with g ∈ C b ( R ). Proposition
S.1 . Assume that √ D n M xy / √ n ∈ B p with M xy = max { M x , M y } . Then underAssumption S.2, we have for any ∆ > , | E [ m ( X ) − m ( Y )] | . { G L ( p ) + G L ( p ) } φ ( M x , M y )+ { G L ( p ) + 3 G L ( p ) L ( p ) + G L ( p ) } D n √ n ( ¯ m x, + ¯ m y, )+ { G L ( p ) + 3 G L ( p ) L ( p ) + G L ( p ) } D n √ n ( m x, + m y, ) + G ∆ + G E [1 − I ] , (S.20) where G k = sup z ∈ R | ∂ k g ( z ) /∂z k | for k ≥ . In addition, if √ D n M xy / √ n ∈ B p , we can replace m x, + m y, by ¯ m x, + ¯ m y, in the above upper bound. With the aid of Assumption S.2, Proposition S.1 follows from similar arguments in the proof ofProposition 3.1 (the technical details are omitted to conserve space). When specialized to stationary M -dependent time series, we have the following result. Theorem
S.1 . Suppose √ M +1) M xy / √ n ∈ B p with M xy = max { M x , M y } , and M x > u x ( γ ) and M y > u y ( γ ) for some γ ∈ (0 , . Then | E [ m ( X ) − m ( Y )] | . { G L ( p ) + G L ( p ) } φ ( M x , M y )+ { G L ( p ) + 3 G L ( p ) L ( p ) + G L ( p ) } (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ G ϕ ( M x , M y ) σ j p p/γ ) + G γ. (S.21) Under Condition (18), we may set φ ( M x , M y ) = C (1 /M x + 1 /M y ) and ϕ ( M x , M y ) = C ′ (1 /M / x +1 /M / y ) for some constants C, C ′ > in (S.21). OOTSTRAPPING HIGH DIMENSIONAL TIME SERIES Remark
S.1 . Consider L λ ( x ) = P pj =1 g j,λ ( x j ) /p in Example S.1. When M x > u x ( γ ) and M y >u y ( γ ) for some γ ∈ (0 , , we have | E [ m λ ( X ) − m λ ( Y )] | . G φ ( M x , M y ) + G (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ G ϕ ( M x , M y ) σ j p p/γ ) + G γ, (S.22) where m λ = g ◦ L λ . Under Condition (18), | E [ m λ ( X ) − m λ ( Y )] | . G (1 /M x + 1 /M y ) + G (2 M + 1) √ n ( ¯ m x, + ¯ m y, )+ G (1 /M / x + 1 /M / y ) σ j p p/γ ) + G γ. (S.23) By letting M x → + ∞ , M y → + ∞ , and γ = M / √ n , we deduce that | E [ m λ ( X ) − m λ ( Y )] | . M ( ¯ m x, + ¯ m y, ) / √ n . Note that in this case, p is allowed to grow arbitrarily. X. ZhangDepartment of StatisticsUniversity of Missouri-ColumbiaColumbia, MO 65211.E-mail: [email protected]