Robust Adaptive Rate-Optimal Testing for the White Noise Hypothesis
RRobust Adaptive Rate-Optimal Testing for the White Noise Hypothesis (with supplementary material)Alain Guay Emmanuel Guerre ˇStˇep´ana Lazarov´a This version: 2nd November 2012 This is a revised version of a paper previously entitled ‘Adaptive Rate-Optimal Detection of Small Cor-relation Coefficients’. We would like to thank Richard Davis, Marcelo Fernandes, Liudas Giraitis, BruceHansen, George Kapetanios, Remigijus Leipus, Peter Phillips and Aris Spanos for stimulating questions andsuggestions. We would also like to thank participants of the Queen Mary Econometric Reading Group,Economic Seminar at York, Southampton and Warwick Universities, 2008 Oxbridge Time Series Workshop,2008 Vienna Model Selection Workshop, 2008 North American Econometric Society Conference and 2008European Econometric Society Conference, Journ´ees 2008 de Statistiques de Rennes, 2009 Bristol Econo-metrics Worshop and 2011 Brazilian Time Series and Econometrics School. Last but not least, we would liketo thank an anonymous associate editor and two referees whose feedback helped to improve the paper. Allremaining errors are ours. The first author gratefully acknowledges financial support of Fonds de recherchesur la soci´et´e et la culture (FQRSC) and the Social Sciences and Humanities Research Council of Canada(SSHRC). The last two authors are thankful for the financial support from the School of Economics andFinance of Queen Mary, University of London. CIRP´EE and CIREQ, Universit´e du Qu´ebec `a Montr´eal, e-mail: [email protected] School of Economics and Finance, Queen Mary, University of London, e-mail: [email protected] School of Economics and Finance, Queen Mary, University of London, e-mail: [email protected] a r X i v : . [ m a t h . S T ] N ov bstract A new test is proposed for the weak white noise null hypothesis.The test is based on a new automatic selection of the order for aBox-Pierce (1970) test statistic or the test statistic of Hong (1996).The heteroskedasticity and autocorrelation-consistent (HAC) criticalvalues from Lee (2007) are used, allowing for estimation of the errorterm. The data-driven order selection is tailored to detect a newclass of alternatives with autocorrelation coefficients which can be o ( n − / ) provided there is sufficiently many of such coefficients. Asimulation experiment illustrates the good statistical properties of thetest both under the weak white noise null and the alternative. JEL Classification : Primary C12; Secondary C32.
Keywords : Weak white noise hypothesis; HAC Inference; Auto-matic nonparametric tests; Adaptive rate-optimality. Introduction
Testing for white noise is important in many econometric contexts. Ignoring autocorrelationof the error terms in a linear regression model can lead to erroneous confidence intervals andtests. Correlation of residuals from an ARMA model or of the squared residuals from anARCH model can indicate an improper choice of the order. Investigation of autocorrelationfunction is also a popular diagnostic tool in macroeconomics and finance, see e.g. Durlauf(1991) and Campbell, Lo and Craig MacKinlay (1997). Earliest tests of the white noisehypothesis were based on confidence intervals for autocorrelation coefficients as described byFan and Yao (2005). See also Xiao and Wu (2011) who have recently derived the asymptoticdistribution of the maximum standardized sample covariance of weak white noise, that is anstationary process which is uncorrelated but possibly dependent. A second approach wasestablished by Grenander and Rosenblatt (1952) who extended goodness-of-fit tests such asKolmogorov and Cram´er-von Mises tests to tests of white noise hypothesis. Grenander andRosenblatt (1952) has been refined by Durlauf (1991), Anderson (1993) and Deo (2000).Delgado, Hidalgo and Velasco (2005) have studied a modified test statistic to be used withresiduals. Shao (2011a) has recently extended this setup to cover the weak white noise nullhypothesis. A third approach, pioneered by Box and Pierce (1970), is based on the sumof squared sample autocorrelation coefficients up to a given order p . Delgado and Velasco(2012), Francq, Roy and Zakoian (2005), Kuan and Lee (2006) and Lobato (2001) haveconsidered the weak white noise hypothesis. The case where p grows with the sample size n has been considered by Hong (1996) in a strong white noise setup and recently extended tothe weak white noise null hypothesis by Shao (2011b) and Xiao and Wu (2011).This paper contributes to the literature by proposing a data-driven choice ˆ p of the order p used in a Box-Pierce type statistic for a test of the weak white noise null hypothesis. Underthis null, (cid:98) p tends to 1 in probability so that the null limit behavior of the test statistic isdriven by the first-order sample autocovariance. It is shown that the test can be implemented using robust critical values of Lee (2007) who extends the work of Lobato (2001) for the caseof observed variables and of Kuan and Lee (2006) for the case of residuals. The generalframework of Lee (2007) includes as a specific case standardization using steep origin kernelsproposed by Phillips, Sun and Jin (2006) which can improve the power of the resulting test.Under the alternative, the data-driven (cid:98) p can be as large as necessary.An appealing feature of Cram´er-von Mises type of tests is the ability to detect Pitman localdirectional alternatives converging to the null with the parametric rate n − / . This contrastswith detection results for Box-Pierce type test by Hong (1996) which is only consistentunder slower rates of convergence for local alternatives defined through the spectral densityfunction. The conclusions of Hong (1996) suggest that Cram´er-von Mises tests are morepowerful than Box-Pierce tests. One of the contributions of the present paper is to point outthat this ranking of two types of tests is not universal and there exist classes of alternativesagainst which Box-Pierce tests are more powerful than Cram´er-von Mises tests.We illustrate this point using a new class of alternatives defined through the autocovariancefunction. The new class of alternatives formalizes the idea that small autocorrelation coeffi-cients of magnitude ρ n can be detected provided that there are sufficiently many coefficientspresent at smaller lags. An important finding of the paper is that detection is still possiblefor very small ρ n , namely for ρ n = o (cid:0) n − / (cid:1) . As described in Section 4, this type of alterna-tives includes moving average processes with a significant long term multiplier but o (cid:0) n − / (cid:1) impulse response coefficients. Such processes therefore correspond to a macroeconomic sce-nario where short term policies have no significant effects whereas long term policies mayhave an impact. For such alternatives, the conditional expectation of the present given thepast gives o (cid:0) n − / (cid:1) weights to each lagged observations. Therefore this process is hard topredict since it is very close to a martingale difference process. These alternatives can be ofinterest in finance where arbitrage could forbid strong deviations from martingale difference. Why such alternatives can be detected by Box-Pierce tests can be intuitively explainedas follows. Let (cid:98) R j and R j be respectively the sample and population covariance at lag j .Following Hong (1996), Shao (2011b) and Xiao and Wu (2011), the nonrobust critical regionof the Box-Pierce test of order p n → ∞ is n (cid:80) p n j =1 (cid:16) (cid:98) R j / (cid:98) R − (cid:17) (2 p n ) / ≥ c α , (1.1)where c α is a standard normal critical value. Arguing as Shao (2011b, Theorem 2.2) suggeststhat n (cid:80) p n j =1 (cid:16) (cid:98) R j / (cid:98) R − (cid:17) (2 p n ) / = n (cid:80) p n j =1 R j /R (2 p n ) / + O P (1) . (1.2)(1.2) suggests that the Box-Pierce test is consistent provided (cid:16) n/ (2 p n ) / (cid:17) (cid:80) p n j =1 R j /R islarge enough. Let N n be the number of correlation coefficients R j /R ≥ ρ n for j ∈ [1 , p n ],so that (cid:16) n/ (2 p n ) / (cid:17) (cid:80) p n j =1 R j /R ≥ nN n ρ n / (2 p n ) / . The Box-Pierce test is consistent if n / (cid:32) N n p / n (cid:33) / ρ n → ∞ , (1.3)a condition which allows for ρ n = o (cid:0) n − / (cid:1) provided there are enough correlation coefficientslarger than ρ n , that is, N n /p / n → ∞ , which holds in particular when the exact order of N n is p n . In other words, summing squared sample correlations in the Box-Pierce statisticallows us to detect very small population correlations provided they are not too sparse andare concentrated at lags smaller than p n . As shown in this paper, such alternatives are notdetected by Cram´er-von Mises tests.An important limitation of the critical region (1.1) is the use of an ad hoc order p n . Manyauthors consider a deterministic p n such that p n → ∞ . This choice of order is inadequate fordetecting alternatives with correlations at low lags: taking p n = 30 for instance is unlikely togive a test with power against popular AR (1) or M A (1) alternatives on samples of moderatesize. Conversely, taking a fixed p n is not suitable for detecting higher order alternatives. The need to properly address the tuning of a smoothing parameter with a role similarto p n has spurred the development of data-driven approaches for various nonparametrictesting problems. The so-called adaptive approach, focuses on data-driven tests which detectalternatives in a smoothness class converging to the null at the fastest possible rate giventhat the smoothness class is unknown to the test user. See in particular Fan (1996), Spokoiny(1996), Horowitz and Spokoiny (2001), Guerre and Lavergne (2005), Guay and Guerre (2006)and Chen and Gao (2007) for various nonparametric models and related null hypotheses oftheoretical or practical relevance. Golubev, Nussbaum and Zhou (2010) has proved LeCam equivalence of Gaussian time series with spectral densities in a Besov space with thecontinuous-time Gaussian white noise model considered in Spokoiny (1996). This result islimited to Gaussian time series and is not useful in practice since it does not deliver ready-to-apply white noise tests. In fact, most of the data-driven choices of p n proposed in thewhite noise testing literature are not adaptive rate-optimal. As an exception, Fan and Yao(2005) extend the work of Fan (1996), outlining but not analyzing a data-driven test whichis based on the maximum of a set of standardized Box-Pierce statistics, see also Golubev etal. (2010).A popular data-driven method of choosing the order is the selection procedure proposedby Newey and West (1994) in the context of long run variance estimation. See, amongother, the simulation section of Hong and Lee (2005). This selection procedure is howeverdifficult to justify theoretically. Newey and West selection method, although being optimalfor long-run variance estimation, does not produce a rate-optimal test because the optimalorder for testing differs from the optimal order for estimation, see e.g. Guerre and Lavergne(2002) and the references therein. Escanciano and Lobato (2009) study a data-driven choiceof order based on an AIC/BIC criterion which is suitable for estimation but is not adaptiverate-optimal for tests of the white noise hypothesis. This contrasts with the new data-driventests proposed here. The paper is organized as follows. Section 2 describes the penalty approach leading to thedata-driven order (cid:98) p and the construction of the rejection region of the test. Section 3 studiesthe statistical properties of the test under the general weak white noise null hypothesis andunder the new class of alternatives mentioned above. It illustrates the importance of thechoice of a suitable penalty both under the null and the alternative. Section 4 states ouradaptive rate-optimality results and compares the new test with the Cram´er-von Mises testof Deo (2000), the data-driven test of Escanciano and Lobato (2009) and the maximum testof Xiao and Wu (2011). Section 5 reports a simulation experiment that proposes a calibrationof the penalty term and compares our automatic test with other data-driven tests, includingtests of Deo (2000) or Escanciano and Lobato (2009) and a test that uses the Newey andWest (1994) plug-in order selection procedure. Section 6 concludes. Proofs can be found inthe supplementary material.2. Construction of the test and choice of the critical values
Consider a variable u t , t = 1 , ..., n , which is either directly observed or defined as the errorof a parametric model m ( X t ; θ ) = u t with some observed covariate X t . In the later case u t isnot observed but can be estimated using the residuals (cid:98) u t = u t ( (cid:98) θ ) where (cid:98) θ is an estimator of θ .We are interested in testing that u t is uncorrelated. Suppose { u t } is a stationary process withzero mean and covariance function R j = Cov( u t , u t + j ). The null and alternative hypothesesare then H : R j = 0 for all j (cid:54) = 0 versus H : R j (cid:54) = 0 for some j (cid:54) = 0 . A natural estimator of the covariance is (cid:98) R j = (cid:80) n −| j | t =1 (cid:98) u t (cid:98) u t + | j | /n , j = 0 , ± , . . . , ± ( n − f n ( λ ; p ) = 12 π ∞ (cid:88) j = −∞ K (cid:18) | j | p (cid:19) (cid:98) R j exp ( − ijλ ) , K (0) = 1 and (cid:90) ∞ K ( x ) dx = 1 , where the support of K is [0 , (cid:98) S p = nπ (cid:90) π − π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ f n ( λ ; p ) − (cid:98) R π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dλ = n n − (cid:88) j =1 K (cid:18) jp (cid:19) (cid:98) R j . (2.1)For the uniform kernel K ( t ) = I ( t ∈ [0 , (cid:98) R , (cid:98) S p is the Box-Piercestatistic (cid:100) BP p / (cid:98) R = n (cid:80) pj =1 (cid:98) R j / (cid:98) R . Large values of (cid:98) S p indicate evidence against the null.Under certain weak dependence conditions on the weak white noise { u t } and for p = p n → ∞ growing with a suitable rate, Shao (2011b) shows that (cid:16)(cid:16) (cid:98) S p − (cid:98) S (cid:17) /R − E ∆ ( p ) (cid:17) /V ∆ ( p )converges to a standard normal where E ∆ ( p ) = n − (cid:88) j =1 (cid:18) − jn (cid:19) (cid:18) K (cid:18) jp (cid:19) − K ( j ) (cid:19) ,V ( p ) = 2 n − (cid:88) j =1 (cid:18) − jn (cid:19) (cid:18) K (cid:18) jp (cid:19) − K ( j ) (cid:19) , and we shall use accordingly E ∆ ( p ) and V ( p ) as a standardization for (cid:16) (cid:98) S p − (cid:98) S (cid:17) /R . Inthis notation, the subscript “∆” indicates difference (cid:98) S p − (cid:98) S . For the Box-Pierce statistic, E ∆ ( p ) = ( p −
1) (1 + O ( p/n )) and V ( p ) = 2 ( p −
1) (1 + O ( p/n )) and these approximationsremain valid for other kernels up to a multiplicative constant. We propose to select (cid:98) p as thesmallest integer number maximizing the penalized statistic, (cid:98) p = arg max p ∈ [1 ,p n ] (cid:32) (cid:98) S p (cid:98) R − E ( p ) − γ n V ∆ ( p ) (cid:33) = arg max p ∈ [1 ,p n ] (cid:32) (cid:98) S p − (cid:98) S (cid:98) R − E ∆ ( p ) − γ n V ∆ ( p ) (cid:33) , (2.2)where E ( p ) = (cid:80) n − j =1 (1 − j/n ) K ( j/p ) and p n ≤ n −
1. This penalization procedure issimilar to penalization proposed by Guay and Guerre (2006) or Guerre and Lavergne (2005).It differs from the penalization used in the AIC or BIC procedures which use a higher penaltyterm γ n E ( p ) in place of E ( p )+ γ n V ∆ ( p ). Escanciano and Lobato (2009) similarly use penaltyterm (cid:98) γ n E ( p ) for p in a bounded finite set. The intuition for (cid:98) p is as follows. Note first that (2.2) uses the difference (cid:98) S p − (cid:98) S . The ideahere is that the test should be based on (cid:98) S unless (cid:98) S p − (cid:98) S is large enough for some p . Sincethe criterion maximized in (2.2) is equal to 0 for p = 1, (cid:98) p differs from 1 whenever there is a p such that (cid:16) (cid:98) S p − (cid:98) S (cid:17) / (cid:98) R − E ∆ ( p ) − γ n V ∆ ( p ) > (cid:16) (cid:98) S p − (cid:98) S (cid:17) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) > γ n , (2.3)an inequality which, in view of the asymptotic normality established by Shao (2011b) underthe null, has the flavour of a one-sided significance test using a critical value γ n . Such aconstruction suggests that the data-driven statistic (cid:98) S (cid:98) p better captures higher order covari-ances than (cid:98) S . Therefore, rejecting the null when (cid:98) S (cid:98) p ≥ z should give a more powerful testthan the test (cid:98) S ≥ z based on (cid:98) S and the same critical value z as recommended below. See(3.8) in Theorem 4 for a more formal statement. Why the chosen (cid:98) p should have certainoptimality properties can be seen by viewing (2.2) as a bias variance trade-off. Theorem 2.2in Shao (2011b) suggests that (cid:16) (cid:98) S p − (cid:98) S (cid:17) / (cid:98) R − E ∆ ( p ) is an estimator of n (cid:80) ∞ j =2 R j with abias n (cid:80) ∞ j = p +1 R j and a standard deviation V ∆ ( p ). Hence (2.2) choses a p which maximizes − n (cid:80) ∞ j = p +1 R j − γ n V ∆ ( p ) and therefore achieves the so called bias variance trade-off, lead-ing to a data-driven test statistic (cid:98) S (cid:98) p = (cid:98) S + (cid:98) S (cid:98) p − (cid:98) S with the best potential to detect analternative.Under H , it is expected that (cid:98) p = 1 with a high probability provided γ n is large enoughsince all the (cid:98) S p − (cid:98) S estimate 0. Since (cid:98) S (cid:98) p = (cid:98) S + o P (1) under the null, the critical values ofthe test can be taken to be the same as the critical values of the test based upon the simplestatistic (cid:98) S . A HAC-robust standardization of (cid:98) S is given in Lee (2007). In the case where u t is observed, an inconsistent “estimator” of the long run variance of (cid:80) n − t =1 u t u t +1 / ( n − is, for a kernel k ( · ), k ij = k ( | i − j | /n ) and ϕ i = (cid:80) i − t =1 (cid:16) u t u t +1 − (cid:98) R (cid:17) /n / , (cid:101) Γ = n − (cid:88) i =1 n − (cid:88) j =1 (( k ij − k i,j +1 ) − ( k i +1 ,j − k i +1 ,j +1 )) ϕ i ϕ j . For residuals ˆ u t , let (cid:98) θ i be the estimator (cid:98) θ computed with the first i observations and estimate ϕ i recursively by (cid:98) ϕ i = (cid:80) i − t =1 (cid:16) u t (cid:16)(cid:98) θ i (cid:17) u t +1 (cid:16)(cid:98) θ i (cid:17) − (cid:98) R (cid:17) /n / . Let (cid:98) Γ = n − (cid:88) i =1 n − (cid:88) j =1 (( k ij − k i,j +1 ) − ( k i +1 ,j − k i +1 ,j +1 )) (cid:98) ϕ i (cid:98) ϕ j . It follows from Lee (2007) that the limit distribution of n (cid:98) R / (cid:101) Γ when u t is observed and of n (cid:98) R / (cid:98) Γ when u t is is estimated by residuals ˆ u t is, assuming that k ( · ) is twice continuouslydifferentiable W (1) − (cid:82) (cid:82) k (cid:48)(cid:48) ( r − s ) ( W ( r ) − rW (1)) ( W ( s ) − sW (1)) drds (2.4)where W is a standard Brownian motion. Let z L ( α ) be be the (1 − α )th quantile of (2.4).The critical values and rejection region of the test are (cid:98) z L ( α ) = K (1) (cid:101) Γ z L ( α ) , (2.5) (cid:98) z KL ( α ) = K (1) (cid:98) Γ z L ( α ) , (2.6) (cid:98) S (cid:98) p ≥ (cid:98) z ( α ) where (cid:98) z ( α ) = (cid:98) z L ( α ) for observed { u t } , (cid:98) z KL ( α ) for residuals { (cid:98) u t } . (2.7)We also consider a modified version of the test which employs a standardization of the samplecovariances as used by Deo (2000) or Escanciano and Lobato (2009), (cid:98) S ∗ p = n n − (cid:88) j =1 K (cid:18) jp (cid:19) (cid:32) (cid:98) R j (cid:98) τ j (cid:33) where (cid:98) τ j = 1 n − j n − j (cid:88) t =1 (cid:98) u t (cid:98) u t + j − (cid:18) nn − j (cid:98) R j (cid:19) . (2.8)The sample variance (cid:98) τ j is an estimator of τ j = Var ( u t u t + j ) which, for observed u t , is theasymptotic variance of n / (cid:16) (cid:98) R j − R j (cid:17) in the case of uncorrelated u t u t + j or for martingale difference. The corresponding data-driven order p and critical values are (cid:98) p ∗ = arg max p ∈ [1 ,p n ] (cid:16) (cid:98) S ∗ p − E ( p ) − γ n V ∆ ( p ) (cid:17) , (2.9) (cid:98) z ∗ ( α ) = (cid:98) z ( α ) (cid:98) τ . (2.10)While the test (2.7) is studied in Theorems 1 and 2, the test with rejection region (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) z ∗ ( α )is studied in Theorem 3.Let us now turn to notations and our main assumptions. In what follows, a n (cid:16) b n meansthat the sequences { a n } and { b n } have the same order, i.e. that a n /b n and b n /a n are both O (1). For a real random variable Z and a positive real number a , (cid:107) Z (cid:107) a = E /a [ | Z | a ].Consider first the case of observed u t . When studying the performance of the test underthe alternative, we consider a sequence { u t,n } of stationary alternatives with autocovari-ance coefficients { R j,n } . This means that for each given n , the process { u t,n , t ∈ N } isstationary. This type of sequences includes for instance local M A ( ∞ ) alternatives u t,n = ε t + (cid:80) ∞ i =1 a i,n ε t − i where a i,n → n grows. Further, for residuals (cid:98) u t = u t (cid:16)(cid:98) θ (cid:17) , weassume that √ n (cid:16)(cid:98) θ − θ n (cid:17) is asymptotically centered with θ n is a pseudo-true value and set u t ( θ n ) = u t,n . For the sake of brevity, { u t,n } and { R j,n } are abbreviated to { u t } and { R j } in the rest of the paper but we maintain the dependence with respect to n when stating ourmain assumptions. Under the null and the alternative, we follow Shao (2011b), Xiao andWu (2011), and restrict ourselves to stationary processes satisfying a moment contractioncondition by Wu (2005). We assume that u t,n = F n ( . . . , e t − , e t ) for some measurable F ,where e t , t = −∞ , . . . , + ∞ , are i.i.d. (univariate or vector) random variables. Consider anindependent copy { e (cid:48) t } of { e t } and define u τt,n = F n ( . . . , e τ − , e (cid:48) τ , e τ +1 , . . . , e t − , e t ) τ ≤ t ≤ n, where e τ is changed to e (cid:48) τ . Assume that for some a > j ≥ (cid:13)(cid:13) u t,n − u t − jt,n (cid:13)(cid:13) a ≤ δ a ( j ) where δ a ( j ) → j → ∞ , a condition meaning that shocks cannot have a long run impact. A fast decrease of δ a ( j ) alsoensures that u t = u t,n becomes independent of u t − j when j grows as the α -mixing assumptionused in Francq et al. (2005) or Delgado and Velasco (2012). Shao (2011b) assumes that δ a ( j )decreases at an exponential rate, a condition which is satisfied by many linear and nonlineartime series models, including threshold, stochastic volatility, bilinear or GARCH models, seeShao (2011b), Wu (2005, 2007) and the references therein. Our main assumptions are givenbelow. Assumption K (Kernel) . The kernel function K ( · ) in (2.1) from R + to [0 , ∞ ) is non-increasing, bounded away from on [0 , / and continuous differentiable over its support [0 , . The kernel k ( · ) used for the critical values is twice continuously differentiable over itscompact support. Assumption R (Regularity) . Under H and H , sup t (cid:107) u t,n (cid:107) a < C R / ,n for some a > and, for some b > , δ a ( j ) ≤ C j − − b . Moreover /C ≤ R ,n ≤ C , and max j ∈ [1 ,p n ] R ,n / Var ( u t,n u t + j,n ) ≤ C . Assumption P (Order p ) . The maximal order p n diverges faster than some power of n with p n = o ( n / (2(1+3 /a )) ) as n → ∞ , where a > is the same constant as in Assumption R above.The penalty sequence γ n satisfies γ n > , γ n → ∞ and γ n = o (cid:0) n / (cid:1) as n → ∞ . Assumption M (Model) . The processes { u t,n } , the model m ( X t ; θ ) = u t and the estimators (cid:110)(cid:98) θ t (cid:111) satisfy the following conditions:(i) There is a sequence { θ n } , with θ n = θ for all n under H , such that n / (cid:16)(cid:98) θ [ ns ] − θ n (cid:17) (cid:48) , n − / ns ] (cid:88) t =1 ( u t,n u t − ,n − E [ u t,n u t − ,n ]) (cid:48) , s ∈ [0 , (2.11) D [0 , -converges in distribution to a Brownian motion with a full rank covariance matrix. (ii) The residual function admits a second order expansion u t ( θ ) = u t,n + ( θ − θ n ) (cid:48) u (1) t,n +( θ − θ n ) (cid:48) u (2) t,n ( θ − θ n ) + r t,n ( θ ) where, for any C > , sup t ∈ [1 ,n ] sup θ ; (cid:107) θ − θ n (cid:107)≤ Cn − / | r t,n ( θ ) | = o P (cid:18) n (cid:19) (2.12) and, for each n , { u t,n , u (1) t,n , u (2) t,n } is a stationary process with E / (cid:2) (cid:107) a t (cid:107) (cid:3) ≤ C , { a t } beingsuccessively (cid:110) u (1) t,n (cid:111) , (cid:110) u (2) t,n (cid:111) (cid:8) u t,n (cid:9) , (cid:110) u t,n u (1) t,n (cid:111) , (cid:110) u (1) t,n u (1) (cid:48) t,n (cid:111) , (cid:110) u t,n u (2) t,n (cid:111) , and where (cid:80) ∞ j = −∞ E (cid:20)(cid:13)(cid:13)(cid:13) u (1) t − j,n u t,n (cid:13)(cid:13)(cid:13) (cid:21) ≤ C , sup j ∈ Z E (cid:20)(cid:13)(cid:13)(cid:13) n − / (cid:80) nt = j +1 (cid:16) u (1) t − j,n u t,n − E [ u (1) t − j,n u t,n ] (cid:17)(cid:13)(cid:13)(cid:13) (cid:21) ≤ C , sup j ∈ Z E (cid:104)(cid:13)(cid:13)(cid:13) u (1) t,n u t,n u t − j,n (cid:13)(cid:13)(cid:13)(cid:105) ≤ C , and sup j ∈ Z E (cid:20)(cid:13)(cid:13)(cid:13) n − / (cid:80) nt = j +1 (cid:16) u (1) t,n u t,n u t − j,n − E [ u (1) t,n u t,n u t − j,n ] (cid:17)(cid:13)(cid:13)(cid:13) (cid:21) ≤ C . The compact sets [0 , /
2] and [0 ,
1] in Assumption K are somehow arbitrary and can bereplaced by any nested compact intervals. Note however that Assumption K forbids theuse of the Daniell kernel K ( x ) = sin ( x ) /x due to the nonincreasing function and boundedsupport conditions.Assumption R imposes a polynomial decay on the coefficients δ a ( j ), a condition whichis weaker than the exponential rate assumed in Shao (2011b). Note that in Assumption Pthe order of p n can come closer to n / when a is high, that is when u t has finite momentsof higher order. Under Assumption R, { u t,n } must have finite moments of order twelve atleast. This is mostly needed for a proof of Theorem 1 below based on Lindeberg substitutionmethod, see Pollard (2002, p. 179), which uses moment bounds as the Cauchy-Schwarzinequality E (cid:104)(cid:0) u t u t + j (cid:1) (cid:105) ≤ E [ u t ]. Since implementing the proposed data-driven tests witha large p n would in principle allow us to detect a wider class of alternatives, AssumptionP, which plays an important role under the null in our proofs, may be too restrictive. Oursimulation experiments indeed suggest that Assumption P can be weakened when focusingon white noise processes of practical relevance since the order p n (cid:16) n gives good results forvarious white noise processes of practical interest. On the other hand, choosing a smaller p n still gives a good power, see comments on Table 5 at the end of the simulation experimentssection.When { u t } is observed, Assumption M is equivalent to Assumption 1 of Lobato (2001)and the FCLT for n − / (cid:80) [ ns ] t =1 ( u t u t − − E [ u t u t − ]) is a consequence of Assumption R andthe FCLT of Wu (2007). Assumption M is easily verified for simple linear models and OLSestimation where u (2) t,n and r t,n can be set to 0. Assumption M-(i) is a shortened version ofAssumptions B1 and A2 of Kuan and Lee (2006) who employ a standard linear expansion n / (cid:16)(cid:98) θ − θ n (cid:17) = n − / (cid:80) nt =1 (cid:96) t + o P (1) to show that (2.11) satisfies a functional central limittheorem (FCLT) called for in M-(i). The FCLT is mostly used under H to show that P (cid:16) (cid:98) S ≥ (cid:98) z ( α ) (cid:17) → α and P (cid:16) (cid:98) S ∗ ≥ (cid:98) z ∗ ( α ) (cid:17) → α in the case of residuals. The full-rank FCLTcondition in Assumption M-(i) implies certain restrictions. For example, for a correctlyspecified AR (1) model X t − θX t − = u t , the case of θ = 0 is ruled out, a value of the parameterwhich would in principle be excluded when considering such an AR (1) specification. Theorem4 at the end of the next section explains how to overcome this issue with an alternative choiceof critical values when Assumption M-(i) is too restrictive. The next section describes somesuitable theoretical requirements for the penalty sequence γ n while the simulation sectionproposes a calibration of γ n which gives good results for various white noise processes andalternatives. 3. Asymptotic level and consistency
An important issue in the construction of the test (2.7) is the choice of the penalty sequence.Choosing γ n large enough implies that (cid:98) p stays close to 1 and so the test statistic (cid:98) S (cid:98) p remainsclose to (cid:98) S . Hence, on the one hand, using large γ n ensures that the level of the test is closeto its nominal size. On the other hand, a large γ n may substantially limit the power of thetest since the statistic (cid:98) S (cid:98) p would not differ from (cid:98) S . The trade-off between size and power isaddressed by Theorem 1 and Theorem 2. Consider first the properties of the test under the null hypothesis. The following theoremgives a lower bound for γ n which ensures that (cid:98) p = 1 asymptotically so that the test isasymptotically of level α . Theorem 1.
Let Assumptions K , M, P and R hold. If the penalty sequence { γ n , n ≥ } satisfies γ n ≥ (1 + (cid:15) ) (2 ln ln n ) / for some (cid:15) > , (3.1) then under H , lim n →∞ P ( (cid:98) p = 1) = 1 and the test (2.7) is asymptotically of level α . Under the null hypothesis, the selected order (cid:98) p is asymptotically equal to 1. It follows that (cid:98) S (cid:98) p = (cid:98) S + o P (1) and that critical values (2.5) or (2.6) guarantee that the test is asymptoticallyof level α . A key result is therefore that lim n →∞ P ( (cid:98) p = 1) = 1 holds under various white noisemodels and observed u t or residuals (cid:98) u t . That the estimation has no impact asymptoticallyfollows from (3.1) which imposes γ n → ∞ . When (cid:98) θ is √ n -consistent, estimating the residualsgives test statistics satisfying (cid:98) S p = n p (cid:88) j =1 (cid:32) n n − j (cid:88) t =1 u t u t + j (cid:33) + O P (1)uniformly in p . The fact that the remainder term O P (1) is negligible compared to γ n is acrucial element in showing that the asymptotic behavior of (cid:98) p is not affected by the estimationunder the null. The divergence of γ n is also important to account for the fact that thestandardization E ∆ ( p ) and V ∆ ( p ) are only valid when p → ∞ since γ n → ∞ imposes thateither (cid:98) p = 1 or (cid:98) p diverges because (2.3) cannot hold for finite p >
1. Compared to theexisting adaptive results of Horowitz and Spokoiny (2001), Guerre and Lavergne (2005),Guay and Guerre (2006) or Chen and Gao (2007), an important technical contribution ofour paper is that Theorem 1 holds without assuming that the set of admissible p is a powerset { a j , j ∈ N } , a > Another important finding is that the penalty sequence γ n can diverge with the low order(ln ln n ) / allowed by (3.1). This contrasts with the larger order ln n used in the BICselection procedure and in the corresponding data-driven tests. In view of the potentialnegative impact of a large γ n on the power of the test, it is worth asking if the lower bound(3.1) can be improved, that is if P ( (cid:98) p = 1) → γ n . The proof suggests that this is not the case. The main argument is basedon expression P ( (cid:98) p (cid:54) = 1) = P max p ∈ [2 ,p n ] (cid:16) ˆ S p − ˆ S (cid:17) / ˆ R − E ∆ ( p ) V ∆ ( p ) ≥ γ n (3.2)for the probability of not selecting 1. It can be seen from the proof of Theorem 1 that, forthe Box-Pierce version of the test, the right-hand side of (3.2) asymptotically behaves likethe maximum of standardized partial sums whose exact order is (2 ln ln n ) / , see (B.38) inthe Supplementary Material. Hence the bound (3.1) is optimal to achieve P ( (cid:98) p = 1) → R j = R j,n may go to 0 when n increases. The new class of alternatives is defined similarly to (1.3) in the introductionsection. Consider first a sequence ρ n → P n . An important indicator fordetection of alternatives is the number of correlations above ρ n , N n = N n ( P n , ρ n ) = {| R j /R | ≥ ρ n , ≤ j ≤ P n } . (3.3)The next theorem gives a detection condition on N n , P n and ρ n . Theorem 2.
Suppose Assumptions K, M, R and P hold. There exists a constant κ ∗ > such that the test (2.7) is consistent against all alternatives { u t } satisfying, for some ρ n > and P n ∈ [1 , p n / , n / (cid:32) N n γ n P / n (cid:33) / ρ n ≥ κ ∗ . (3.4) Condition (3.4) is similar to the detection condition (1.3) required for consistency of theBox-Pierce test (1.1). However a key difference between the two conditions is that while in(1.3) the lag order p n is assumed known and is used in the construction of the test statistic,in (3.4) the lag order P n in (3.4) is unknown. This illustrates the adaptive capability of thenew test. A second important difference between (1.3) and (3.4) is that the latter involvespenalty sequence γ n . For given P n and N n detection condition (3.4) admits rate ρ ∗ n satisfying ρ ∗ n (cid:16) n / (cid:32) γ n P / n N n (cid:33) / . (3.5)Rate ρ ∗ n in (3.5) deteriorates with the penalty sequence. Condition (3.4) thus demonstratesthe potential negative impact of the penalty sequence on the power of the test. This impactcan also be seen from proof of Theorem 2 which uses the fact that the test (2.7) rejects thenull whenever (cid:98) S p − (cid:98) R E ( p ) (cid:98) R V ∆ ( p ) ≥ γ n + (cid:98) z ( α ) (cid:98) R V ∆ ( p ) for some p ∈ [2 , p n ] . (3.6)For the alternatives for which (3.6) only holds for p → ∞ so that V ∆ ( p ) → ∞ , (3.6) suggeststhat γ n may be more important than the critical value (cid:98) z ( α ) regarding detection.Two special cases of (3.5) are worth mentioning. First, the situation where lim n →∞ γ n P / n /N n =0 is of special interest since (3.5) shows that the test can detect correlation coefficients con-verging to 0 at a rate that is faster than the parametric rate n − / . The best possible ratein this case is ρ ∗ n (cid:16) γ / n / (cid:16) nP / n (cid:17) / which is achieved for “saturated” alternatives with N n (cid:16) P n . Second, a less favorable case corresponds to more sparse correlation coefficientssatisfying lim n →∞ γ n P / n /N n = ∞ . In this case (3.5) does not allow for correlation coeffi-cients converging to 0 at the rate of n − / . This case has been covered by Donoho and Jin(2004) for a theoretical model where a known number P n of independent Gaussian variableswith mean n ( R j /R ) and variance 1 is observed. These authors show that in such a setupthe best possible detection rate is ρ n = (ln n/n ) / , a rate which is achieved by the maximumwhite noise test of Xiao and Wu (2011). This suggests that our test may not be optimal when lim n →∞ γ n P / n /N n = ∞ . However, it is shown in Proposition 1 in Section 4 belowthat the test of Xiao and Wu (2011), unlike our test, does not detect moderately sparsealternatives satisfying (3.5) with lim n →∞ γ n P / n /N n = 0 and γ n (cid:16) (2 ln ln n ) / .We conclude this section with two extensions of our main results. The first extension showsthat the test derived from (2.8) and (2.9) has similar properties as the test (2.7). Theorem 3.
Suppose Assumptions K, M, R and P hold. Then P ( (cid:98) p ∗ = 1) → under H and the test which rejects the null when (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) z ∗ ( α ) is asymptotically of level α . It alsodetects the alternatives satisfying (3.4) in Theorem 2 for a large enough κ ∗ . The second extension is useful in the case of residuals when the full-rank FCLT conditionin Assumption M-(i) is too restrictive so that the critical value (cid:98) z KL ( α ) in (2.6) cannotbe used. Suppose that an additional test statistic (cid:98) T n with critical values (cid:98) t n ( α ) satisfyinglim n →∞ P (cid:16) (cid:98) T n ≥ (cid:98) t n ( α ) (cid:17) = α under the null is available. Consider the critical value (cid:98) c ∗ n ( α ) = (cid:98) S ∗ − (cid:98) T n + (cid:98) t n ( α ) . (3.7) Theorem 4.
Suppose that Assumptions K, R and P hold, as Assumption M-(ii) with √ n (cid:16)(cid:98) θ − θ n (cid:17) = O P (1) where the deterministic sequence { θ n } is such that θ n = θ for all n under H . Suppose also that (A0) lim n →∞ P (cid:16) (cid:98) T n ≥ (cid:98) t n ( α ) (cid:17) = α under H and (A1) (cid:98) c n ( α ) ≤ O P ( γ n ) under the considered alternative. Then the test which rejects the null when (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) c n ( α ) is asymptotically of level α and detects the alternatives satisfying the condition(3.4) of Theorem 2 for a sufficiently large κ ∗ . Moreover and even if (A1) does not hold, wehave under the alternative and for any sample size n , P (cid:16) (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) c n ( α ) (cid:17) ≥ P (cid:16) (cid:98) T n ≥ (cid:98) t n ( α ) (cid:17) . (3.8)Condition (A1), which allows for (cid:98) c n ( α ) P → − ∞ , means, when (cid:98) t n ( α ) = O P (1) as usual, that (cid:98) T n diverges at least as fast as (cid:98) S ∗ or that both lack power against the considered alternative and are O P (1). The bound (3.8) means that the data-driven test is at least as powerfulthan the test based on (cid:98) T n . As a consequence of (3.8), the test (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) z ∗ ( α ) is as least aspowerful as (cid:98) S ∗ ≥ (cid:98) z ∗ ( α ), (cid:98) z ∗ ( α ) as in (2.10). The use of the critical value (3.7) can give adata-driven test whose power properties can be tailored to be optimal against some specificalternatives by a proper choice of a corresponding optimal (cid:98) T n . Examples of test statistic (cid:98) T n which does not require Assumption M-(i) can be found in Delgado and Velasco (2012) andFrancq et al. (2005). Delgado and Velasco (2012) propose a Box-Pierce statistic correctedfor estimation with an elegant general approach and some parametric optimality propertiesunder Gaussianity whereas Francq et al. (2005) is more specific to ARMA specifications.4. Adaptive rate-optimality and comparisons with other tests
While Theorem 1 gives the lower bound (3.1) of order (2 ln ln n ) / for the penalty sequence γ n that is necessary to ensure that the test is asymptotically of level α , Theorem 2 suggeststhat increasing γ n can impair the power of the test. Hence a good compromise for the choiceof the penalty sequence suitable both under H and H is γ n (cid:16) (2 ln ln n ) / . Once thischoice is made one may ask if the resulting test is the best possible in the sense that thereis no other test that can detect alternatives satisfying a condition less restrictive than (3.4),when κ ∗ = κ n → n →∞ γ n P / n /N n = 0. Theorem 5.
Let u t be observed. For any sequence κ n → , there exists a sequence ofalternatives { u t } such that, for some P n ∈ [1 , p n ] and ρ n > with ρ n ≥ κ n n / (cid:32) (2 ln ln n ) / P / n N n (cid:33) / , lim n →∞ (2 ln ln n ) / P / n N n = 0 , As discussed when introducing approximation (3.5), the test (2.7) is not optimal for detection of sparsealternatives with lim n →∞ γ n P / n /N n = ∞ which are not considered here. such that the other assumptions of Theorem 2 are satisfied, but that cannot be detected byany possible asymptotically α -level test. Hence, when γ n (cid:16) (2 ln ln n ) / , it is not possible to improve on the detection condition(3.4) and the rate ρ ∗ n in (3.5) is optimal. We now give an example of alternatives which aredetected by the test (2.7) but not by other popular tests. Consider the following high-ordermoving average process, u t = u t,n = ε t + νγ / n n / P / n P n (cid:88) k =1 ψ k ε t − k , P n (cid:88) k =1 ψ k = O ( P n ) , lim n →∞ P n = ∞ , (4.1)where { ε t } is a strong white noise with variance σ , ν is a scaling constant and γ n (cid:16) (2 ln ln n ) / . This alternative has moving average coefficients of order γ / n / (cid:16) n / P / n (cid:17) = o (cid:0) n − / (cid:1) provided P n diverges at a polynomial rate. Hence short term shocks have statisti-cally negligible impact. However when ψ k = 1 for all k , the long term multiplier of (4.1) isequal to ν (cid:16) γ n P / n /n (cid:17) / which is of larger order than n − / . The following lemma describesthe covariance function and conditional expectation of the alternative (4.1). Lemma 1. If P n = o (( n/γ n ) / ) and lim n →∞ ( γ n /n ) = 0 then the alternative { u t } in (4.1)satisfies R = σ (cid:16) O (cid:16) γ n P / n /n (cid:17)(cid:17) and, uniformly in j ∈ [1 , P n ] , R j = νγ / n n / P / n ψ j σ + o (cid:32) γ / n n / P / n (cid:33) . Moreover E [ u t | u t − k , k ≥
1] = νγ / n n / P / n P n (cid:88) k =1 ψ k u t − k + O P (cid:18) γ n P n n (cid:19) . Hence a distinctive feature of the alternative (4.1) when max ≤ k ≤ P n | ψ k | = O (1) is thatmax j ≥ | R j | = o (cid:0) n − / (cid:1) provided P n /γ n → ∞ . The expression of E [ u t | u t − k , k ≥
1] revealsthat u t can be very difficult to forecast since the coefficients of the lagged variables areall o (cid:0) n − / (cid:1) provided P n = o (cid:0) n / /γ n (cid:1) . This suggests that such a process will be seen in practice as a martingale difference when using standard statistical tools. This may be arelevant example of alternatives in economical or financial contexts where arbitrage occurs.We show in Proposition 1 below that the new tests detect these alternatives but that thisis not the case for three tests based on the following test statistics, W n = b n (cid:32) n / max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:99) R j (cid:98) τ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − b n (cid:33) , where b n = (2 ln J n − ln ln J n − ln (4 π )) / , (4.2) CvM n = nπ J n (cid:88) j =1 (cid:98) R j j (cid:98) τ j , (4.3) EL n = (cid:100) BP ∗ (cid:98) p ∗ EL , (cid:98) p ∗ EL = arg max p ∈ [1 ,J n ] (cid:110)(cid:100) BP ∗ p − (cid:98) γ ∗ EL p (cid:111) where (4.4) (cid:98) γ ∗ EL = ln n if n / max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12) (cid:98) R j (cid:98) τ j (cid:12)(cid:12)(cid:12) ≤ (2 . n ) / , W n in (4.2) is studied in Xiao and Wu (2011) who show that W n asymptoticallyhas an extreme value distribution. The statistic CvM n in (4.3), due to Deo (2000) forobserved u t , is a version of the Cram´er-von Mises test of Durlauf (1991) partially correctedfor heteroskedasticity. Test statistic EL n has been introduced by Escanciano and Lobato(2009) for observed u t and a fixed J n . As in our test, the order (cid:98) p ∗ EL selected by Escancianoand Lobato (2009) is asymptotically equal to 1 under H and similar critical values canbe used. To show that tests (4.2)–(4.4) do not detect alternatives with small correlationcoefficients, it is sufficient to consider a Gaussian null hypothesis G under which { u t } is a Gaussian white noise process { ε t } with variance σ against an alternative G underwhich { u t } is given by (4.1) with Gaussian i.i.d. { ε t } , (cid:80) P n k =1 ψ k = O ( P n ), max ≤ k ≤ P n | ψ k | = O (1) , min ≤ k ≤ P n | ψ k σ | ≥ ν > γ n and P n → ∞ with γ n /P / n = o (1 / ln n ) and P n = O (cid:16) ( n/γ n ) / (cid:17) ≤ p n / γ n (cid:16) (2 ln ln n ) / satisfies (3.1). We assume that J n = O (cid:0) n / (cid:1) . Proposition 1.
Let u t be observed. Suppose that Assumptions K and P hold. For ν largeenough, the alternative G as above satisfies (3.4) and (i) the test (2.7) and its (cid:98) S ∗ (cid:98) p ∗ version consistently detect G . By contrast,(ii) statistics W n , CvM n and EL n have the same asymptotic distribution under G and G and the corresponding tests are therefore not consistent. Proposition 1-(ii) implies that tests based on W n , CvM n or EL n are not adaptive rate-optimal. Let (cid:98) R ,j / (cid:98) τ ,j and (cid:98) R ,j / (cid:98) τ ,j be the standardized sample covariance computed under G and G respectively. It is established in the proof of Proposition 1 thatmax j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) R ,j (cid:98) τ ,j − (cid:98) R ,j (cid:98) τ ,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o P (cid:32) n log n ) / (cid:33) , (4.5)which implies that tests based on W n and CvM n are not consistent. The case of EL n test isa bit more involved but, due to its penalty scheme, this test statistic is asymptotically equalto (cid:100) BP ∗ under the null and the alternative so that it cannot detect G by (4.5).5. Simulation experiments
Our simulation experiments aim to propose a valid penalty sequence γ n to be tested undervarious strong and weak white noise processes and under various alternatives. Since prelim-inary experiments have shown that the test statistic (cid:98) S (cid:98) p may yield an oversized test for somepractically relevant white noise processes, we consider the test based on (cid:98) S ∗ (cid:98) p ∗ as in (2.8) and(2.9). To investigate the impact of choosing a large p n latter on we allow for all possibleorders, setting p n = n − . We consider two kernels. The first is K ( t ) = I ( t ∈ [0 , BP . The seconduses the Parzen kernel k ( t ) = − t + 6 | t | , | t | ≤ / , − | t | ) , / < | t | ≤ , k (1) = 0 which would give a meaningless (cid:98) S ∗ = 0, we change k ( t ) into K ( t ) = k ( t/ / k (1 /
2) and label the corresponding tests as
P arz . The critical values (2.10) (cid:98) z ∗ ( α ), see also (2.5) and (2.6), use a power Parzen kernel k ( t ) = k ( t ), where the exponent32 is has been proposed by Lee (2007) whose simulations show that such a choice ensures thatthe test with rejection region n (cid:98) R ≥ (cid:98) z ∗ ( α ) has good power properties. We consider 10%, 5%and 1% significance levels. A preliminary simulation experiment with 100 ,
000 replicationsgives that the corresponding quantiles z L ( α ) of (2.4) used in (cid:98) z ∗ ( α ) are approximately 3 . .
58 and 10 .
97 respectively, which are in line with the critical values tabulated by Phillipset al. (2006, Table 6).The first experiment analyzes the sensitivity of the test to the penalty term and aims tocalibrate the proportionality constant for the penalty sequence. The experiment investigatesthe behavior of the test under the null for γ n = γ (2 ln ln ( n − / where the proportionalitycoefficient γ ranges from 2 . .
8. The process u t is a white noise with the standardnormal distribution. The next table reports the simulated levels for 50 ,
000 replications andthe percentage % { (cid:98) p ∗ (cid:54) = 1 } of simulation draws for which (cid:98) p ∗ (cid:54) = 1, an important indicator indeciding whether a difference between nominal and observed levels is due to a too small γ n or improper critical values. In Table 1, ‘*’ indicates an oversized test, i.e. such that the nullof a level smaller than the nominal size is rejected at 1% level by the one-sided test usingthe simulated level. [INSERT TABLE 1 HERE] A threshold value for the BP test is γ = 3 . n = 1 , P arz test is slightly less oversized. Both tests havevery similar value of % { (cid:98) p ∗ (cid:54) = 1 } , well below 1% for γ = 3 .
4. In the remaining simulationexperiments γ = 3 . BP and P arz tests with the data-driven test EL based on the statistic EL n in (4.4) with J n = n − (cid:98) p IMSE used by Hong and Lee (2005) and the test statistic (cid:98) p IMSE = (cid:16) ∨ (cid:98) C / ( f ) (cid:17) n / , where (cid:98) C ( f ) = 144 (cid:80) n − j = − ( n − k ( j/ (cid:101) p ) j (cid:98) R j / (cid:98) τ j . (cid:80) n − j = − ( n − k ( j/ (cid:101) p ) (cid:98) R j / (cid:98) τ j ,IM SE = (cid:80) (cid:98) p IMSE j =1 k ( j/ (cid:98) p IMSE ) (cid:110) (cid:98) R j / (cid:98) τ j − (cid:0) − jn (cid:1)(cid:111)(cid:16) (cid:80) (cid:98) p IMSE j =1 k ( j/ (cid:98) p IMSE ) (cid:0) − jn (cid:1) (cid:17) / , where k ( · ) is the Parzen kernel and (cid:98) τ j is defined as in (2.8). In the definition of (cid:98) p IMSE , (cid:101) p is a pilot bandwidth that is set to (cid:101) p = 4( n/ / . Note that (cid:98) C ( f ) remains potentiallystochastic under the null so that the null limit distribution of IM SE may differ from thestandard normal distribution valid for deterministic p n → ∞ . We however follow commonpractice and use standard normal critical values for the IM SE test. The last benchmarktest,
CvM , is based on Deo’s (2000) Cram´er-von Mises statistic
CvM n in (4.3) and uses thecritical values tabulated by Anderson and Darling (1952).The first comparison under H is based on i.i.d. { u t } with the following distributions:standard normal (‘Nor’ in Table 2), Student with three degrees of freedom (‘Stud’), andcentered chi square with one degree of freedom (‘Chi’). The Student distribution is usedto test the sensitivity of our test to the lack of higher-order moments while the chi squaredistribution can reveal sensitivity to skewness. [INSERT TABLE 2 HERE] As in Table 1, the size of the
P arz test is slightly better than the size of the BP test butboth perform well here, although BP is slightly oversized under the ‘Chi’ white noise. The EL and IM SE are generally oversized with strong size distortions for ‘Chi’. The
CvM testperforms well except for the ‘Chi’ experiment.The next experiment considers observed weak white noise u t or residuals ˆ u t . Two condi-tional heteroskedastic martingale difference processes are examined. The first is a GARCH(1,1) process with u t = s t ζ t and s t = 0 .
001 + 0 . s t − + 0 . u t − where ζ t are i.i.d. stan-dard normal innovations. The second process is an ARCH(1) process with u t = s t ζ t and s t = 0 . . u t − . Due to the ARCH coefficient larger than (cid:112) / . E [ u t ] = ∞ andthe tests are, in principle, not expected to behave well in this experiment. The next threeprocesses are uncorrelated but are not martingale differences, so that the CvM test is notexpected to have a correct size and is only reported here as a benchmark. The first, labelled‘Bilinear’ in Table 3 below, is a bilinear model u t = ζ t + 0 . ζ t − u t − . The second, labelled‘No-MDS’, is given by u t = ζ t − ζ t − (1 + ζ t − + ζ t ) and has been examined by Lobato (2001).The third, ‘All-Pass’, is an All-Pass ARMA(1,1) process examined by Lobato, Nankervis andSavin (2002), u t − . u t − = ζ t − ζ t − / .
5, where ζ t i.i.d. and have the Student distributionwith 9 degrees of freedom. Since the root of the M A part is the inverse of the AR root, theresulting process is uncorrelated but the u t are dependent due to non-Gaussian ζ t . Finally,experiment ‘ARRes’ examines residuals from the AR (1) y t = 0 . y t − + u t , (cid:98) u t = y t − (cid:98) θy t − , (cid:98) θ = (cid:80) n − t =0 y t y t +1 / (cid:80) n − t =0 y t . The BP , P arz and EL tests are all adapted to the estimationeffect thanks to the use of the critical values (cid:98) z ∗ ( α ) of (2.10). The critical values of the IM SE and
CvM do not account for estimation of residuals and the corresponding tests should benot be expected to have a correct level under ‘ARRes’. [INSERT TABLE 3 HERE]
The performance of the BP and P arz tests is very good with levels that are not oversizedin general. However the BP and P arz tests can be undersized, see the case of ‘ARCH(1)’.But even in this case the value of % { (cid:98) p ∗ (cid:54) = 1 } remains very small suggesting that the sizedistortion is due to the critical values of Lee (2007). The behavior of the EL test is moreerratic, with levels that can be either oversized, as in the case of ‘GARCH(1,1)’, ‘All Pass’ This is confirmed by a not reported simulation experiment which shows that using standard chi-squaredcritical values give good results. and ‘ARRes’, or undersized. The IM SE test can also be severely oversized. The
CvM behaves well for ‘GARCH(1,1)’ and ‘ARCH(1)’ but, as expected, is severely size distortedin the other cases.We now consider H . In what follows, the critical values of the EL and IM SE tests areadjusted to achieve the desired level under normality. A first set of fixed alternatives isconsidered,
M A u t = ε t + 0 . ε t − , AR u t = 0 . u t − + ε t , M A ε t + 0 . ε t − and AR u t = 0 . u t − + ε t with i.i.d. standard normal innovations ε t and n = 200, 1 ,
000 isconsidered. The
CvM test is expected to perform better for these alternatives, especially‘ AR
1’ and ‘
M A (cid:98) p ∗ and s (cid:98) p ∗ are the simulation mean and standarddeviation of (cid:98) p ∗ . These statistics are useful for assessing the impact of p n on the power sincelarge (cid:98) p ∗ or s (cid:98) p ∗ suggests that decreasing p n can decrease the power. [INSERT TABLE 4 HERE] The low-lag ‘ AR
1’ and ‘
M A
1’ experiments have very similar characteristics with powers ofthe tests for α = 10% increasing from 17% −
18% for n = 200 to 43% −
47% for n = 1 , (cid:98) p ∗ or s (cid:98) p ∗ . The BP , P arz and EL seemto be outperformed by the IM SE and
CvM tests. For the higher-order experiments ‘
M A AR
6’ and n = 1 , BP , P arz and EL tests clearly outperform their competitorswith power close or equal to 100%. For n = 200, the EL test outperforms its competitorswith BP as a second-best. The high values of (cid:98) p ∗ and s (cid:98) p ∗ for the BP and P arz tests illustratethe fact that (cid:98) p ∗ is suitable for testing but not as an estimator of the order of an AR or M A process.The second experiment under H examines, for n = 200, the power of the 5% level BP and P arz tests against H ρ : u t = v t − ρv t − , ρ ∈ [0 , / v t = s t ζ t and s t = 0 .
001 + 0 . s t − + 0 . v t − where ζ t are i.i.d. standard normal innovations while, under ‘ARRes’, the v t are i.i.d. N (0 , and u t = v t − ρv t − is estimated from the AR (1) model X t = 0 . X t − + u t . We do notconsider the other tests to avoid undesirable size correction effects, but we compare BP and P arz with (cid:102) M EP n test of Lee (2007) which rejects the null when n (cid:98) R ≥ (cid:98) z ( α ) where (cid:98) z ( α )is defined in (2.7), and an α level test which rejects the null when n (cid:98) R ≥ c ( α ), where theinfeasible c ( α ), dependent of the white noise process under consideration, is computed from10 ,
000 preliminary replications. Since the latter is locally optimal under Gaussianity, it islabelled
LOT . Figure 1 reports the nine power graphs corresponding to each white noiseexperiments. [INSERT FIGURE 1 HERE]
Except for white noise processes such as ‘NoMDS’ for which the new tests are undersized,the power of the four tests are quite similar in the vicinity of ρ = 0, suggesting that ourdata-driven tests are, for processes close to Gaussianity, not far from being locally optimalas LOT . The global performance of all tests deteriorate for nonlinear white noise processesas ‘ARCH(1)’, for which
LOT has a very low power compared to its competitors BP , P arz and (cid:102) M EP n . P arz dominates its competitors for such white noise processes. As expectedfrom (3.8),
P arz and BP perform as well as or better than (cid:102) M EP n which is less powerfulthan P arz for heteroskedastic noises the ‘Bilinear’, ‘ARCH(1)’, ‘GARCH(1,1)’ or ‘NoMDS’.The third experiment under H considers a second set of alternatives given by randomized“small correlation” processes defined in (4.1), u t = ε t + (2 . × γ n ) / n / P / P (cid:88) k =1 ψ k,b ε t − k , ψ k,b i.i.d. ∼ N (0 ,
1) . (5.1)In this setting b is the simulation index, b = 1 , ..., , { ψ k,b } are drawn for each simulation. Randomizing the moving average coefficients allowsus to explore various shapes of the correlation function. The noise { ε t } is independent ofthe moving average coefficients { ψ k,b } and is drawn randomly from the standard normaldistribution. Since (cid:80) Pk =1 ψ k,b = P (1 + o P (1)) when P tends to infinity, the covariances of (5.1) can be o (cid:0) n − / (cid:1) as shown in Lemma 1. We consider two scenarios. In the experiment‘LOW’, P is set to 15 for n = 200 and to 75 when n = 1 , P , so P = 30 for n = 200 and P = 150 for n = 1 , [INSERT TABLE 5 HERE] The BP test outperforms its competitors and P arz comes as a second-best. The EL testachieves power similar to that of the BP test only in the LOW experiment when P = 15 and n = 200. The power of the IM SE and
CvM tests decreases with the sample size while thepower of the other tests increases, showing the importance of a proper data-driven choiceof the order. The high values of (cid:98) p ∗ P arz may suggest that the
P arz test would be negativelyaffected by choosing a lower value of p n . However setting p n = 3 (cid:104) ( n/ / (cid:105) instead of p n = n − BP test. 6. Concluding remarks
The paper proposes an automatic test for the weak white noise null hypothesis for observedvariables or residuals from a parametric model. The test is based on a new data-driven orderselection procedure applied to the Box-Pierce (1970) test statistic. The critical region usesrobust critical values of Lee (2007) which can account for estimation of residuals. An impor-tant theoretical finding is that the new test can detect alternatives with small autocorrelationcoefficients of order ρ n = o (cid:0) n − / (cid:1) where n is the sample size, provided that the number ofautocorrelation coefficients at moderate lags is large enough. The proposed test is shown tobe adaptive rate-optimal against this class of alternatives. The paper gives examples of mov-ing average alternatives with small autocorrelation coefficients of order o (cid:0) n − / (cid:1) which aredetected by the new test but not by tests previously proposed by Deo (2000), Escanciano and Lobato (2009) or Xiao and Wu (2011). These alternatives correspond to a plausiblemacroeconomic scenario where a temporary shock has no significant impact whereas perma-nent shocks may cause significant changes. They can also be of interest in finance wherearbitrage should rule out strong deviations from the difference of martingale hypothesis,since these alternatives generate conditional expectation given the past of these alterna-tives with order o P (cid:0) n − / (cid:1) . A simulation experiment has shown that the new test can copewith various weak types of white noise processes including the ARCH or GARCH processespopular in empirical finance. The simulation experiment has also confirmed good powerproperties of the test regarding detection of standard AR (1) and M A (1) alternatives whenthe noise is highly nonlinear, for instance in the case of the
ARCH (1) process considered inthe experiment. 7.
ReferencesAnderson, T.W. (1993). Goodness of Fit Tests for Spectral Distributions.
The Annalsof Statistics , 830–847. Anderson, T.W. and
D.A. Darling (1952). Asymptotic Theory of Certain “Goodnessof Fit” Criteria Based on Stochastic Processes.
Annals of Mathematical Statistics , 193–212. Box, G. and
D. Pierce (1970). Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models.
Journal of American Statistical Association , 1509–1526. Campbell, J.Y., A.W. Lo and
A.C. Craig MacKinlay (1997).
The Econometricsof Financial Markets . Second Edition, Princeton University Press.
Chen, S.X. and
J. Gao (2007). An adaptive Empirical Likelihood Test for ParametricTime Series Regression Models.
Journal of Econometrics , 950–972. Delgado, M.A. and
C. Velasco (2012). An Asymptotically Pivotal Transform ofthe Residuals Sample Autocorrelations with Application to Model Checking.
Journal ofAmerican Statistical Association , 646—958.
Delgado, M.A., J. Hidalgo and
C. Velasco (2005). Distribution Free Goodness-of-Fit Tests for Linear Processes.
The Annals of Statistics , 2568-2609. Deo, R.S. (2000). Spectral Tests of the Martingale Hypothesis under Conditional Het-eroscedasticity.
Journal of Econometrics , 291-315. Donoho, D. and
J. Jin (2004). Higher Criticism for Detecting Sparse HeterogeneousMixtures.
The Annals of Statistics , 962–994. Durlauf, S.N. (1991). Spectral Based Testing of the Martingale Hypothesis.
Journal ofEconometrics , 355-376. Escanciano, J.C. and
I.N. Lobato (2009). An Automatic Portmanteau Test for SerialCorrelation.
Journal of Econometrics , 140–149.
Fan, J. (1996). Test of Significance Based on Wavelet Thresholding and Neyman’s Trun-cation.
Journal of the American Statistical Association , 674–688. Fan, J. and
Q. Yao (2005).
Nonlinear Time Series: Nonparametric and ParametricMethods . Springer.
Francq, C. , R. Roy and
J.M. Zakoian (2005). Diagnostic Checking in ARMA ModelsWith Uncorrelated Errors.
Journal of the American Statistical Association , 532–544.
Golubev, G.K. , M. Nussbaum and
H.H. Zhou (2010). Asymptotic Equivalence ofSpectrum Density Estimation and Gaussian White Noise.
The Annals of Statistics ,181–214. Grenander, U. and
M. Rosenblatt (1952). On Spectral Analysis of Stationary Time-series.
Proceedings of the National Academy of Sciences U.S.A.
Guay, A. and
E. Guerre (2006). A Data-Driven Nonparametric Specification Test forDynamic Regression Models.
Econometric Theory , 543–586. Guerre, E. and
P. Lavergne (2002). Optimal Minimax Rates for Nonparametric Spec-ification Testing in Regression Models.
Econometric Theory , 1139–1171. Guerre, E. and
P. Lavergne (2005). Rate-Optimal Data-Driven Specification Testingfor Regression Models.
The Annals of Statistics , 840–870. Hong, Y . (1996). Consistent Testing for Serial Correlation of Unknown Form.
Economet-rica , 837–864. Hong, Y . and
Y.J. Lee . (2005). Generalized Spectral Tests for Conditional Mean Modelsin Time Series with Conditional Heteroscedasticity of Unknown Form.
Review of EconomicStudies , 499–541. Horowitz, J.L. and
V.G. Spokoiny (2001). An Adaptive, Rate-Optimal Test of aParametric Mean-Regression Model Against a Nonparametric Alternative.
Econometrica , 599–631. Kuan, C.M. and
W.M. Lee (2006). Robust M Tests without Consistent Estimation ofthe Asymptotic Covariance Matrix.
Journal of the American Statistical Association ,1264–1275.
Lee, W.M. (2007). Robust M Tests Using Kernel-based Estimators with Bandwidth Equalto Sample Size.
Economics Letters , 295–300. Lobato, I.N. (2001). Testing That a Dependent Process Is Uncorrelated.
Journal of theAmerican Statistical Association , 1066–1076. Lobato, I.N. , J.C. Nankervis and
N.E. Savin (2002). Testing for Zero Autocorrelationin the Presence of Statistical Dependence.
Econometric Theory , 730–743. Newey, W.K. and
K. West (1994). Automatic Lag Selection in Covariance MatrixEstimation.
Review of Economic Studies , 631–653. Phillips, P.C.B, Y. Sun & S. Jin (2006). Spectral Density Estimation and RobustHypothesis Testing Using Steep Origin Kernels Without Truncation.
International EconomicReview , 837–894. Pollard, D. (2002).
A User’s Guide to Measure Theoretic Probability . Cambridge Uni-versity Press.
Shao, X. (2011a). A Bootstrap-assisted Spectral Test of White Noise under UnknownDependence.
Journal of Econometrics , 213–224.
Shao, X. (2011b). Testing for White Noise under Unknown Dependence and its Applica-tions to Goodness-of-Fit for Time Series Models.
Econometric Theory , 312–343. Spokoiny, V.G. (1996). Adaptive Hypothesis Testing Using Wavelets.
The Annals ofStatistics , 2477–2498. Xiao, H. and
W.B. Wu (2011). Asymptotic Inference of Autocovariances of StationaryProcesses. University of Chicago, arXiv:11053423v1.
Wu, W.B. (2005). Nonlinear System Theory: Another Look at Dependence.
Proceedingsof the National Academy of Sciences of the United States of America , 14150–14154.
Wu, W.B. (2007). Strong Invariance Principles for Dependent Random Variables.
TheAnnals of Probability , 2294–2320. F i g u r e . E m p i r i c a l r e j e c t i o np r o b a b ili t i e s o f L O T ( b l a c k ‘ + ’li n e ) , P a r z (r e d s o li d li n e ) , B P (r e d d o tt e d li n e ) a nd L ee ( ) (cid:102) M E P t e s t( b l u e ‘ x ’li n e ) . T h e l e v e l o f t h e s e t e s t s i s % . T h e a l t e r n a t i v e i s a n M A ( ) w i t h a m o v i n ga v e r ag ec o e ffi c i e n tr a n g i n g f r o m t o1 / ndd i s t u r b a n ce s a s i n T a b l e s - . T h e s a m p l e s i ze i s n = nd t h e nu m b e r o f r e p li c a t i o n s i s , . (cid:13) . . . . . . n , , , , , . (cid:11) B P = % . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . . . . . . (cid:11) B P = % . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . . . . (cid:11) B P = % . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . (cid:3) . . (cid:3) . . . % f b p (cid:3) B P = g . . . . . . . . . . . . (cid:11) P a r z = % . . . . . . . . . . . . (cid:11) P a r z = % . (cid:3) . . . . . . . . . . . (cid:11) P a r z = % . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . (cid:3) . . . . . % f b p (cid:3) P a r z = g . . . . . . . . . . . . T a b l e . P e n a l t y s e q u e n c e i m p a c t o n l e v e l s ( (cid:13) n = (cid:13) ( l n l n ( n (cid:0) )) = , , r e p li c a t i o n s ) . A (cid:147) * (cid:148) i nd i c a t e s a n o v e r s i ze d t e s t a tt h e % l e v e l. T e s t s B PP a r z E L I M S E C v M f u t g n , , , , , N o r (cid:11) = % . . . . . (cid:3) . . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . S t ud (cid:11) = % . . . . . (cid:3) . . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . C h i (cid:11) = % . (cid:3) . (cid:3) . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . (cid:3) . (cid:3) . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) . (cid:3) % f b p (cid:3) = g . . . . . . T a b l e . i.i. dd i s t r i bu t i o n s ( (cid:13) n = : ( l n l n ( n (cid:0) )) = , , r e p li c a t i o n s ) . A (cid:147) * (cid:148) i nd i c a t e s a n o v e r s i ze d t e s t a tt h e % l e v e l. T e s t s B PP a r z E L I M S E C v M f u t g n , , , , , G A R C H ( , ) (cid:11) = % . . . . . (cid:3) . . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . (cid:11) = % . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . A R C H ( ) (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . (cid:3) . (cid:3) . . (cid:11) = % . . . . . . . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . B ili n e a r (cid:11) = % . . . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . . . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . . . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) % f b p (cid:3) = g . . . . . . N o M D S (cid:11) = % . . . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . . . . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) (cid:11) = % . . . . . . . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . A ll P a ss (cid:11) = % . . . . . (cid:3) . . . . . (cid:11) = % . . . . . (cid:3) . . (cid:3) . . . (cid:11) = % . (cid:3) . . . . (cid:3) . (cid:3) . (cid:3) . (cid:3) . . % f b p (cid:3) = g . . . . . . A RR e s (cid:11) = % . . . . . (cid:3) . . . . . (cid:11) = % . . . . . (cid:3) . . . . . (cid:11) = % . . . . . (cid:3) . . . . . % f b p (cid:3) = g . . . . . . T a b l e . W e a k w h i t e n o i s e a nd e s t i m a t e d r e s i du a l s ( (cid:13) n = : ( l n l n ( n (cid:0) )) = , , r e p li c a t i o n s ) . A (cid:147) * (cid:148) i nd i c a t e s a n o v e r s i ze d t e s t a tt h e % l e v e l. T e s t s B PP a r z E L e s c I M S E e s c C v M f u t g n , , , , , M A (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) A R (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) M A (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) A R (cid:11) = % . . . . . . . (cid:11) = % . . . . . . . (cid:11) = % . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) ( ) . ( ) ( ) . ( ) . ( ) . ( ) . ( ) T a b l e . A R - M A a l t e r n a t i v e s ( (cid:13) n = : ( l n l n ( n (cid:0) )) = , , r e p li c a t i o n s ) . (cid:147) e s c (cid:148) i nd i c a t e s e m p i r i c a l s i ze - c o rr ec t e dp o w e r . T e s t s B PP a r z E L e s c I M S E e s c C v M f u t g n , , , , , L O W (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) ( ) . ( ) ( ) . ( ) . ( ) . ( ) . ( ) H I G H (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . (cid:11) = % . . . . . . . . . . b p (cid:3) ( s b p (cid:3) ) . ( ) ( ) . ( ) ( ) . ( ) . ( ) . ( ) . ( ) T a b l e . S m a ll c o rr e l a t i o n s a l t e r n a t i v e s ( (cid:13) n = : ( l n l n ( n (cid:0) )) = , , r e p li c a t i o n s ) . (cid:147) e s c (cid:148) i nd i c a t e s e m p i r i c a l s i z e - c o rr e c t e dp o w e r . obust Adaptive Rate-Optimal Testing for the White Noise Hypothesis: SupplementaryMaterialAlain Guay Emmanuel Guerre ˇStˇep´ana Lazarov´a This version: 2nd November 2012 CIRP´EE and CIREQ, Universit´e du Qu´ebec `a Montr´eal, e-mail: [email protected] School of Economics and Finance, Queen Mary, University of London, e-mail: [email protected] School of Economics and Finance, Queen Mary, University of London, e-mail: [email protected]
Supplementary Material A: proofs of main results
This section contains the proofs of the results of Section 3. In what follows, a tilde super-script, as in (cid:101) S p = n p (cid:88) j =1 K (cid:18) jp (cid:19) (cid:101) R j where (cid:101) R j = 1 n n −| j | (cid:88) t =1 u t u t + | j | . (A.1)indicates that the variables u t are observed. This also leads to define (cid:101) τ j = 1 n n −| j | (cid:88) t =1 u t u t + | j | , (cid:101) z L ( α ) = (cid:98) z L ( α ) , (cid:101) z ∗ L ( α ) = (cid:98) z ∗ L ( α ) , but we keep the notation (cid:98) p . C and C (cid:48) are constants that may vary from line to line butonly depend on the constants of the assumptions. Notation [ · ] is used for the integer partof a real number and a ∨ b = max ( a, b ), a ∧ b = min ( a, b ). Let u t − jt = u t − jt,n be a copy of u t = F n ( . . . , e t − , e t ) obtained by changing e t − j , e t − j − , . . . into e (cid:48) t − j , e (cid:48) t − j − , . . . . Then thecondition (cid:13)(cid:13) u t − u t − jt (cid:13)(cid:13) a ≤ δ a ( j ) ensures that (cid:13)(cid:13) u t − u t − jt (cid:13)(cid:13) a ≤ Θ a ( j ) where Θ a ( j ) = ∞ (cid:88) i = j δ a ( j ) . (A.2)We first state some intermediary results that are used in the proofs of our main results. Theseintermediary results are proven in a section called “Supplementary Material B”. Lemma A.2gives the order of standardization terms E ( p ), E ∆ ( p ) and V ∆ ( p ). Propositions A.1 and A.2deal with the impact of the estimation of θ . Proposition A.3 is used to study the asymptoticnull behavior of the test and to show that P ( (cid:98) p = 1) → Lemma A.2.
Suppose Assumption K holds and that p n /n ≤ / . (i) There exists a constant C > such that, for q = 1 , and for any ≤ p ≤ p n , pC ≤ (cid:80) n − j =1 (cid:0) − jn (cid:1) q K q (cid:16) jp (cid:17) ≤ Cp , pC ≤ (cid:80) n − j =1 K q (cid:16) jp (cid:17) ≤ Cp , V ( p ) ≤ Cp , and E ∆ ( p ) ≤ (cid:80) n − j =1 (cid:16) K (cid:16) jp (cid:17) − K ( j ) (cid:17) ≤ Cp / V ∆ ( p ) ; (ii) Under Assumption P, for all n and all p ∈ [1 , p n ] , V ∆ ( p ) ≥ C ( p − / and E ∆ ( p ) ≥ . Lemma A.3.
Suppose Assumptions K, M and R hold. Then the rejection regions (cid:101) S ≥ (cid:101) z L ( α ) , (cid:101) S ∗ ≥ (cid:101) z ∗ L ( α ) , (cid:98) S ≥ (cid:98) z KL ( α ) and (cid:98) S ∗ ≥ (cid:98) z ∗ KL ( α ) are asymptotically of level α . Moreover,under H , (cid:98) z L ( α ) , (cid:101) z ∗ L ( α ) , (cid:98) z KL ( α ) and (cid:98) z ∗ KL ( α ) are all O P (1) . Lemma A.4.
Under Assumption R, sup ≤ j ≤ n − Var (cid:16) (cid:101) R j (cid:17) ≤ Cn . Proposition A.1.
Suppose Assumptions M, P and R hold. Then max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − (cid:101) R j (cid:12)(cid:12)(cid:12) = O P (cid:0) n − / (cid:1) , max p ∈ [0 ,n − n (cid:80) pj =1 (cid:16) (cid:98) R j − (cid:101) R j (cid:17) = O P (1) , and max j ∈ [0 ,n − (cid:12)(cid:12)(cid:12)(cid:12) (cid:101) R j − (cid:18) − jn (cid:19) R j,n (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) , max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − R j,n (cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) , max j ∈ [0 ,n − (cid:18) − jn (cid:19) (cid:12)(cid:12)(cid:101) τ j − τ j,n (cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) , max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:98) τ j − τ j,n (cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) . Proposition A.2.
Let Assumptions K, M, P and R hold. Let (cid:101) S p be as in (A.1). Then max p ∈ [2 ,p n ] | (cid:16) (cid:98) S p − (cid:98) S (cid:17) − (cid:16) (cid:101) S p − (cid:101) S (cid:17) | (cid:16) n (cid:80) pj =1 R j,n (cid:17) / = O P (1) and for any p n = O ( n / ) , (cid:98) S p n − (cid:101) S p n = O P (cid:18) (cid:16) n (cid:80) p n j =1 R j,n (cid:17) / (cid:19) . Proposition A.3.
Suppose Assumptions K, M, P and R hold and that H is true. Then(3.1) ensures that lim n →∞ P (cid:32) max p ∈ [2 ,p n ] ( (cid:98) S p − (cid:98) S ) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) ≥ γ n (cid:33) = 0 . Proposition A.4.
Under Assumptions K, P and R, there are some
C, C (cid:48) > such that for n large enough and uniformly in p ∈ [1 , p n ] , E (cid:104) (cid:101) S p (cid:105) − R ,n E ( p ) ≥ Cn p/ (cid:88) j =1 R j,n − C (cid:48) R ,n , E (cid:34) n − (cid:88) j =1 K (cid:18) jp (cid:19) (cid:101) R j τ j,n (cid:35) − E ( p ) ≥ Cn p/ (cid:88) j =1 (cid:18) R j,n R ,n (cid:19) − C (cid:48) . Proposition A.5.
Under Assumptions K, P and R, there is a constant
C > such that for n large enough and uniformly in p ∈ [1 , p n ] , Var (cid:16) (cid:101) S p (cid:17) ≤ C (cid:32) n p (cid:88) j =1 R j,n + p (cid:33) , Var (cid:32) n − (cid:88) j =1 K (cid:18) jp (cid:19) (cid:101) R j τ j,n (cid:33) ≤ C (cid:32) n p (cid:88) j =1 R j,n R ,n + p (cid:33) . A.1.
Proof of Theorem 1. (3.2), (3.1) and Proposition A.3 give that lim n →∞ P ( (cid:98) p (cid:54) = 1) = 0.Hence (cid:98) S (cid:98) p = (cid:98) S + o P (1) and Lemma A.3, which ensures that the retained critical value satisfies P (cid:16) (cid:98) S ≥ (cid:98) z ( α ) (cid:17) → α , yield that the test (2.7) is asymptotically of level α . (cid:3) A.2.
Proof of Theorem 2.
The definition (2.2) of (cid:98) p gives, for any p ∈ [1 , p n ], (cid:98) S (cid:98) p = arg max p ∈ [1 ,p n ] (cid:110) (cid:98) S p − (cid:98) R E ( p ) − γ n (cid:98) R V ∆ ( p ) (cid:111) + (cid:98) R E ( (cid:98) p ) + γ n (cid:98) R V ∆ ( (cid:98) p ) ≥ (cid:98) S p − (cid:98) R E ( p ) − γ n (cid:98) R V ∆ ( p ) . Note that this bound implies (3.6). Since the critical value (cid:98) z ( α ) in (2.7) is bounded under H by Lemma A.3, it is sufficient to find a p n ∈ [1 , p n ] such that (cid:98) S p n − (cid:98) R E ( p n ) − γ n (cid:98) R V ∆ ( p n ) P → + ∞ . Let p n = 2 P n where P n is as in (3.4). Set R n = P n (cid:88) j =1 (cid:18) R j,n R ,n (cid:19) . The detection condition (3.4) gives n R n ≥ nρ n P n (cid:88) j =1 I (cid:40)(cid:18) R j,n R ,n (cid:19) ≥ ρ n (cid:41) = nN n ρ n ≥ κ ∗ γ n p / n / → ∞ , (A.3)with a constant κ ∗ which can be chosen as large as needed. Lemmas A.2, A.4, Assumption Pwhich ensures P n = o (cid:0) n / (cid:1) and γ n = o (cid:0) n / (cid:1) , and Proposition A.1 for the case of residualsyield that (cid:98) S p n − (cid:98) R E ( p n ) − γ n (cid:98) R V ∆ ( p n )= (cid:101) S p n + O P (cid:0) n / R ,n R n (cid:1) − R ,n E ( p n ) − γ n R ,n V ∆ ( p n ) + O P (cid:32) p n + γ n p / n n / (cid:33) ≥ (cid:101) S p n + O P (cid:0) n / R ,n R n (cid:1) − R ,n E ( p n ) − Cγ n R ,n p / n . Now the Chebyshev inequality, Propositions A.4 and A.5, give (cid:101) S p n = E (cid:104) (cid:101) S p n (cid:105) + O P (cid:16) Var / (cid:16) (cid:101) S p n (cid:17)(cid:17) ≥ R ,n E ( p n ) + C (cid:48) R ,n n R n + O P (cid:0) p / n + n / R n (cid:1) . Hence substituting gives, since n R n → ∞ by (A.3), (cid:98) S p n − (cid:98) R E ( p n ) − γ n (cid:98) R V ∆ ( p n ) ≥ C (cid:48) R ,n n R n (1 + o P (1)) − Cγ n R ,n p / n (1 + o P (1)) . Since Assumption R ensures that R ,n stays bounded away from 0, (A.3) gives that (cid:98) S p n − (cid:98) R E ( p n ) − γ n (cid:98) R V ∆ ( p n ) P → + ∞ as requested provided κ ∗ > C (cid:48) /C . (cid:3) A.3.
Proof of Theorem 3.
Consider first the null hypothesis. As seen from the proof ofTheorem 1, it suffices to show thatlim n →∞ P (cid:32) max p ∈ [2 ,p n ] ( (cid:98) S ∗ p − (cid:98) S ∗ ) − E ∆ ( p ) V ∆ ( p ) ≥ γ n (cid:33) = 0 , a statement which implies that (cid:98) p ∗ = 1+ o P (1) so that Lemma A.3 implies that the conclusionof Theorem 1 holds for the test based upon (cid:98) S ∗ (cid:98) p ∗ . Since | R j,n | ≤ (cid:107) u t,n (cid:107) (cid:13)(cid:13) u t,n − u t − jt,n (cid:13)(cid:13) and E (cid:2) u t − j,n u t − j,n (cid:3) = E (cid:104)(cid:0) u t − jt,n (cid:1) u t − j,n (cid:105) + E (cid:104)(cid:16) u t,n − (cid:0) u t − jt,n (cid:1) (cid:17) u t − j,n (cid:105) = R ,n + E (cid:2)(cid:0) u t,n − u t − jt,n (cid:1) (cid:0) u t,n + u t − jt,n (cid:1) u t − j,n (cid:3) , (A.2) shows (cid:12)(cid:12) τ j,n − R ,n (cid:12)(cid:12) ≤ C (cid:107) u t,n (cid:107) Θ ( j ) ≤ Cj − (A.4)for all j ≥
1. Now Lemmas A.2 and A.4, Assumptions K, P and R, and Proposition A.1 givemax p ∈ [2 ,p n ] (cid:12)(cid:12)(cid:12) ( (cid:98) S ∗ p − (cid:98) S ∗ ) − ( (cid:98) S p − (cid:98) S ) / (cid:98) R (cid:12)(cid:12)(cid:12) V ∆ ( p ) ≤ C max p ∈ [1 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) S ∗ p − (cid:98) S p / (cid:98) R (cid:12)(cid:12)(cid:12) p / ≤ C max p ∈ [1 ,p n ] np / p (cid:88) j =1 (cid:32) (cid:98) R j (cid:98) R (cid:33) (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) τ j (cid:98) R − τ j,n R ,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) τ j,n R ,n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:41) ≤ Cnp / n O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) + O P (1) n p n (cid:88) j =1 (cid:98) R j j = o P (1) + O P p n (cid:88) j =1 Var (cid:16) n / (cid:98) R j (cid:17) j = O P (1) . Hence (3.1) and Proposition A.3 P (cid:32) max p ∈ [2 ,p n ] ( (cid:98) S ∗ p − (cid:98) S ∗ ) − E ∆ ( p ) V ∆ ( p ) ≥ γ n (cid:33) = P (cid:32) max p ∈ [2 ,p n ] ( (cid:98) S p − (cid:98) S ) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) + O P (1) ≥ γ n (cid:33) ≤ P (cid:32) max p ∈ [2 ,p n ] ( (cid:98) S p − (cid:98) S ) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) ≥ (cid:16) (cid:15) (cid:17) (2 ln ln n ) / (cid:33) + o (1)= o (1) , which gives the desired result under H . Consider now Theorem 2 and H . Define (cid:98) S (cid:70) p = n p (cid:88) j =1 K (cid:18) jp (cid:19) (cid:98) R j τ j,n , (cid:101) S (cid:70) p = n p (cid:88) j =1 K (cid:18) jp (cid:19) (cid:101) R j τ j,n . Let P n be as in (3.4) and define p n = 2 P n and R n as in the proof of Theorem 2. ThenAssumptions K and R, Propositions A.1 and A.2 (cid:12)(cid:12)(cid:12) (cid:98) S ∗ p n − (cid:98) S (cid:70) p n (cid:12)(cid:12)(cid:12) ≤ Cn p n (cid:88) j =1 (cid:98) R j τ j,n (cid:12)(cid:12)(cid:12)(cid:12) τ j,n (cid:98) τ j − (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) ˇ S (cid:70) p n , (cid:12)(cid:12)(cid:12) (cid:98) S (cid:70) p n − (cid:101) S (cid:70) p n (cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) (cid:98) S p n − (cid:101) S p n (cid:12)(cid:12)(cid:12) = O P (cid:0) n / R n (cid:1) . Hence, for observed variables or residuals, (cid:98) S ∗ p n = (cid:32) O P (cid:32)(cid:18) log nn (cid:19) / (cid:33)(cid:33) (cid:101) S (cid:70) p n + O P (cid:0) n / R n (cid:1) The proof now follows the steps of the one of Theorem 2 based on the order above, PropositionA.4 and A.5, and Lemma A.4 which gives E (cid:104) (cid:101) S (cid:70) p n (cid:105) ≤ C ( p n + n R n ). Hence, since p n = o (cid:16) (log n/n ) / (cid:17) , (cid:98) S ∗ (cid:98) p ∗ = arg max p ∈ [1 ,p n ] (cid:110) (cid:98) S ∗ p − E ( p ) − γ n V ∆ ( p ) (cid:111) + E ( (cid:98) p ∗ ) + γ n V ∆ ( (cid:98) p ∗ ) ≥ (cid:98) S ∗ p n − E ( p n ) − Cγ n p / n = (cid:32) O P (cid:32)(cid:18) log nn (cid:19) / (cid:33)(cid:33) (cid:16) E (cid:104) (cid:101) S (cid:70) p n (cid:105) + Var / (cid:16) (cid:101) S (cid:70) p n (cid:17)(cid:17) − E ( p n ) − Cγ n p / n = C (cid:48) R ,n n R n − Cγ n R ,n p / n + O P (cid:32) p / n + n / R n + (cid:18) log nn (cid:19) / (cid:0) p n + n R n (cid:1)(cid:33) = C (cid:48) R ,n n R n (1 + o P (1)) − Cγ n R ,n p / n (1 + o P (1)) P → + ∞ provided κ ∗ is large enough. (cid:3) A.4.
Proof of Theorem 4.
Since P ( (cid:98) p ∗ = 1) → H , condition (A0) and (3.7) givelim n →∞ P (cid:16) (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) c ∗ n ( α ) (cid:17) = lim n →∞ P (cid:16) (cid:98) S ∗ ≥ (cid:98) c ∗ n ( α ) (cid:17) = lim n →∞ P (cid:16) (cid:98) S ∗ ≥ (cid:98) S ∗ − (cid:98) T n + (cid:98) t n ( α ) (cid:17) = lim n →∞ P (cid:16) (cid:98) T n ≥ (cid:98) t n ( α ) (cid:17) = α, so that the test of interest is asymptotically of level α . Let us now consider the alternative.Arguing as in the proof of Theorems 2 and 3 under condition (A1) shows that the test withcritical value (cid:98) c n ( α ) detects the alternatives (3.4) provided κ ∗ is taken large enough. Considernow (3.8). The definition of (2.9) gives, since E ∆ ( (cid:98) p ∗ ) + γ n V ∆ ( (cid:98) p ∗ ) ≥ (cid:98) S ∗ (cid:98) p ∗ = max p ∈ [1 ,p n ] (cid:16) (cid:98) S ∗ p − E ∆ ( p ) − γ n V ∆ ( p ) (cid:17) + E ∆ ( (cid:98) p ∗ ) + γ n V ∆ ( (cid:98) p ∗ ) ≥ (cid:98) S ∗ − E ∆ (1) − γ n V ∆ (1) = (cid:98) S ∗ . Hence, by (3.7) P (cid:16) (cid:98) S ∗ (cid:98) p ∗ ≥ (cid:98) c n ( α ) (cid:17) ≥ P (cid:16) (cid:98) S ∗ ≥ (cid:98) c n ( α ) (cid:17) = P (cid:16) (cid:98) S ∗ ≥ (cid:98) S ∗ − (cid:98) T n + (cid:98) t n ( α ) (cid:17) = P (cid:16) (cid:98) T n ≥ (cid:98) t n ( α ) (cid:17) , which is (3.8). (cid:3) A.5.
Proof of Theorem 5.
We first introduce a set of alternatives. Let f ( · ) denote thespectral density of a centered Gaussian stationary process { u t } . with covariance coefficients R j . Define a H¨older class of processes asH¨older ( L ) = (cid:40) { u t } : 1 / ≤ inf λ ∈ [ − π,π ] f ( λ ) ≤ sup λ ∈ [ − π,π ] f ( λ ) ≤
3, sup λ ∈ [ − π,π ] | f (cid:48) ( λ ) | ≤ L, ∞ (cid:88) j =0 | R j | ≤ L (cid:41) . The next Lemma describes a family of alternatives which satisfies Assumption R uniformlyfor prescribed constants and a given δ a ( j ) . Lemma A.5.
Consider a centered stationary Gaussian process { u t } with spectral densityfunction f ( λ ) = exp ( g ( λ )) / (2 π ) , where g ( λ ) = 2 ρ p (cid:88) k =1 b k cos ( kλ ) , b k = − , , . (A.5) If p ≥ and ρ ≥ are such that p ρ ≤ (cid:15) ≤ / then there is some constant L > , inde-pendent of (cid:15) , p , ρ and b = ( b k , k ∈ [1 , p ]) , such that (i) | R − | ≤ ρ(cid:15) and | R j − ρb j | ≤ ρ(cid:15) for j ∈ [1 , p ] ; (ii) | R j | ≤ ρ (2 (cid:15) ) (cid:96) for all j in [ (cid:96)p + 1 , ( (cid:96) + 1) p ) and all (cid:96) ≥ ; (iii) { u t } is inH¨older ( L ) ; (iv) Suppose that ρ n = ρ n ( p ) = 2 κ n (2 log log n ) / / (cid:0) np / (cid:1) for some κ n > andbounded away from infinity, and that p ∈ [1 , P n ] with P n = o (cid:18)(cid:16) n/ ( κ n log log n ) / (cid:17) / (cid:19) .Then the associated family of processes { u t ( b, p ) ; b ∈ {− , , } p , p ∈ [1 , P n ] } satisfies As-sumption R for any a > and a δ a ( j ) = O (cid:0) j − − / (cid:1) . Proof of Lemma A.5.
Rewrite g as g ( λ ) = ρ (cid:80) pk = − p b k exp ( ikλ ), b = 0, b k = b − k = b | k | .Since exp ( x ) = (cid:80) ∞ m =0 x m /m ! uniformly over any compact set and max λ | g ( λ ) | ≤ pρ ≤ (cid:15) ≤ /
3, we have R j = (cid:90) π − π exp ( − ijλ ) f ( λ ) dλ = 12 π ∞ (cid:88) m =0 m ! (cid:90) π − π exp ( − ijλ ) ( g ( λ )) m dλ. (A.6)For m >
0, since (cid:82) π − π exp ( − ijλ ) dλ = 2 π if j = 0 and 0 if j (cid:54) = 0,12 π (cid:90) π − π exp ( − ijλ ) ( g ( λ )) m dλ = ρ m π (cid:88) ( k ,...,k m ) ∈ K m b k × · · · × b k m (cid:90) π − π exp ( i ( k + . . . + k m − j ) λ ) dλ = ρ m (cid:88) ( k ,...,k m ) ∈ K m ( j ) b k × · · · × b k m , (A.7)where K m is the set of m -tuples with entries in [ − p, p ] \ { } so that K m = (2 p ) m and K m ( j ) contains m -tuples in K m for which k + · · · + k m = j so that K m ( j ) ≤ (2 p ) m − . Proof of (i).
Part (i) is a consequence of (A.6), (A.7) and inequality 2 pρ ≤ (cid:15) < j ∈ [0 , p ], | R j − I ( j = 0) − ρb j | ≤ ρ (cid:80) ∞ m =2 (2 pρ ) m − m ! ≤ pρ (cid:80) ∞ m =0 1 m ! ≤ eρ(cid:15) < ρ(cid:15) . Proof of (ii).
Let (cid:96)p + 1 ≤ j > ( (cid:96) + 1) p . Observe that K m ( j ) is an empty set when m ≤ (cid:96) .Hence it follows from (A.6) and (A.7) that | R j | ≤ (cid:12)(cid:12)(cid:12) π (cid:80) ∞ m = (cid:96) +1 1 m ! (cid:82) π − π exp ( − ijλ ) ( g ( λ )) m dλ (cid:12)(cid:12)(cid:12) ≤ ρ (cid:80) ∞ m = (cid:96) +1 (2 pρ ) m − m ! ≤ ρ (2 (cid:15) ) (cid:96) e . Proof of (iii).
Observe that | g ( λ ) | ≤ ρp ≤ (cid:15) ≤ / / < − / < exp ( − / ≤ f ( λ ) ≤ exp (1 / ≤ e ≤ λ ∈ [ − π, π ] . Parts (i), (ii) and 0 ≤ ρ ≤ (cid:15) < / pρ ≤ / L large enough, ∞ (cid:88) j =0 | R j | ≤ R + p (cid:88) j =1 | R j | + ∞ (cid:88) (cid:96) =1 ( (cid:96) +1) p (cid:88) j = (cid:96)p +1 | R j | ≤ ρ(cid:15) + (1 + 6 (cid:15) ) pρ + 3 ∞ (cid:88) (cid:96) =1 ( (cid:96) + 1) pρ (2 (cid:15) ) (cid:96) ≤ ∞ (cid:88) (cid:96) =1 ( (cid:96) + 1) (2 (cid:15) ) (cid:96) ≤ L. Since f (cid:48) ( λ ) = g (cid:48) ( λ ) f ( λ ) with g (cid:48) ( λ ) = − ρ (cid:80) pk =1 b k k sin ( kλ ), we have sup λ ∈ [ − π,π ] | f (cid:48) ( λ ) | ≤ × p ρ ≤ Proof of (iv).
Let u t = ε t + (cid:80) ∞ j =1 ψ j ε t − j be the Wold decomposition of the process. Brillinger(2001) and (cid:82) π − π log f ( λ ) exp ( ijλ ) dλ/ π = ρb j gives ψ j = (cid:82) π − π exp ( ρ (cid:80) pk =1 b k exp ( − ikλ )) exp ( ijλ ) dλ (cid:82) π − π exp ( ρ (cid:80) pk =1 b k exp ( − ikλ )) dλ , Var ( ε t ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π (cid:90) π − π exp (cid:32) ρ p (cid:88) k =1 b k exp ( − ikλ ) (cid:33) dλ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Arguing as in (i) and (ii) with an expansion as in (A.6) give Var ( ε t ) = 1, | ψ j − ρb j | ≤ Cρ(cid:15) for j ∈ [1 , p ] and | ψ j | ≤ Cρ (2 (cid:15) ) (cid:96) for all j ∈ [ (cid:96)p + 1 , ( (cid:96) + 1) p ) and all (cid:96) ≥
1. Gaussianity,the choice of ρ in (iv) with the restriction on P n and Wu (2005) give, for any a > δ a ( j ) ≤ C a | ψ j | ≤ C a j − − / . That the other conditions of Assumption R hold uniformlyin p ∈ [1 , P n ] follows from (i) and (ii). (cid:3) We will now define a family F n of correlated Gaussian alternatives. We first introducesome notation. Consider (cid:101) γ n = (2 ln ln n ) / and P (cid:48) = { j , j = 1 , . . . , J n } , 2 J n = P n = o (cid:16) p n ∧ ( n/ (cid:101) γ n ) / (cid:17) so that P (cid:48) ⊂ [1 , p n ] for n large enough. Define also ρ n ( p ) = 2 κ n (cid:101) γ n np / , (cid:101) ρ n ( p ) = 2 ρ n ( p ) (cid:15) n = P n ρ n ( P n ) = ( (cid:101) γ n ) / κ n P / n n / = o (1) . (A.8) Since p ρ n ( p ) ≤ (cid:15) n for all p ∈ P (cid:48) , (cid:15) n plays the role of the real number (cid:15) of Lemma A.5 andwe assume from now on that n is so large that (cid:15) n ≤ /
6. Consider the following log-spectraldensity functions: g ( λ ; b, p ) = 2 (cid:101) ρ n ( p ) (cid:88) k ∈ [ p, p ) b k cos ( kλ ) , b = ( b , . . . , b P n ) ∈ {− , } P n , p ∈ P (cid:48) . Functions g are of the form specified in (A.5). Let W be a symmetric standard Brownianmotion process. Consider a centered stationary Gaussian processes u t,n ( b, p ) = 1(2 π ) / (cid:90) π − π exp (cid:18) g ( λ ; b, p )2 (cid:19) exp ( itλ ) dW ( λ ) . Observe that u t,n (0 , p ) does not depend on p and is a Gaussian white noise process withvariance 1. Let { R j,n ( b, p ) } denote the covariance function of u t,n ( b, p ). The family F n ofGaussian processes can now be defined as F n = (cid:110) { u t,n ( b, p ) } , b ∈ {− , } P n , p ∈ P (cid:48) (cid:111) . Lemma A.5 implies that all sequences { u t,n } in F n satisfies Assumption R and that F n ⊂ H¨older( L ).We now study the asymptotic behavior of the stochastic covariance sequence { R j,n ( B, P ) } .Let N n ( b, p ) be as in (3.3), that is N n ( b, p ) = N n ( { u t,n ( b, p ) } , p, ρ n ( p )) = (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) R j,n ( b, p ) R ,n ( b, p ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ρ n ( p ) , j ∈ [1 , p ] (cid:27) . Lemma A.5-(i,ii) and (A.8) gives that N n ( b, p ) = p/ n large enough and uniformly in p = 2 j ∈ P (cid:48) , so that ρ n ( p ) = 2 κ n (cid:101) γ n / (cid:0) np / (cid:1) = κ n (cid:101) γ n p / / ( nN n ( b, p )). Hence the sequences { u t,n } in F n satisfies condition (i) in Theorem 5. Therefore the Theorem will be provedif we show that sup T n min { u t,n }∈F n P ( T n = 0) ≤ α + o (1), where sup T n is a supremum overasymptotically α -level tests. Since the equivalence result of Golubev et al. (2010) holds over F n ⊂ H¨older( L ) this is equivalent to show that sup T n min { U n }∈F n Q ( T n = 0) ≤ α + o (1), Q being the distribution of the continuous time regression model dU n ( λ ; b, p ) = g ( λ ; b, p ) dλ + 2 π / dW ( λ ) n / , λ ∈ [ − π, π ] , where W ( · ) is a Brownian motion over λ ∈ [ − π, π ]. This can be done as in Spokoiny (1996,Proof of Theorem 2.3) by bounding sup T n min { U n }∈F n Q ( T n = 0) with a Bayes risk, based onthe choice of a uniform distribution for p and a Bernoulli one for b . (cid:3) A.6.
Proof of Lemma 1.
The first approximation R ,n = σ (cid:16) O (cid:16) γ n P / n /n (cid:17)(cid:17) followseasily from the definition (4.1) of the alternative. To show that the second approximation isvalid, note that for j = 1 , ..., P n , R j,n = νγ / n n / P / n ψ j σ + (cid:32) νγ / n n / P / n (cid:33) ( ψ j +1 ψ + · · · + ψ P n ψ P n − j ) σ . By the Cauchy-Schwarz inequality, | ψ j +1 ψ + · · · + ψ P n ψ P n − j | ≤ (cid:80) P n k =1 ψ k = O ( P n ) for all j = 1 , ..., P n , hence, uniformly in j = 1 , ..., P n , R j,n = νγ / n n / P / n ψ j σ + O (cid:32) γ n P / n n (cid:33) = νγ / n n / P / n ψ j σ + o (cid:32) γ / n n / P / n (cid:33) since P n = o (( n/γ n ) / ).For the expression of E [ u t | u t − k , k ≥ n large enough, E [ u t | u t − k , k ≥
1] = νγ / n n / P / n P n (cid:88) k =1 ψ k ε t − k = νγ / n n / P / n P n (cid:88) k =1 ψ k (cid:32) u t − k − νγ / n n / P / n P n (cid:88) j =1 ψ j ε t − k − j (cid:33) = νγ / n n / P / n P n (cid:88) k =1 ψ k u t − k − ν γ n nP / n P n (cid:88) k =1 ψ k P n (cid:88) j =1 ψ j ε t − k − j . Now, since { ε t } is a strong white noise and (cid:80) P n k =1 ψ k = O ( P n ), ν γ n nP / n P n (cid:88) k =1 ψ k P n (cid:88) j =1 ψ j ε t − k − j = ν γ n nP / n P n (cid:88) (cid:96) =2 max( P n ,(cid:96) − (cid:88) k =1 ψ k ψ (cid:96) − k ε t − (cid:96) = O P γ n n P n P n (cid:88) (cid:96) =2 max( P n ,(cid:96) − (cid:88) k =1 ψ k ψ (cid:96) − k / = O P γ n (cid:16)(cid:80) P n k =1 ψ k (cid:17) n / = O P (cid:18) γ n P n n (cid:19) , which ends the proof of the Lemma. (cid:3) A.7.
Proof of Proposition 1.
Let us now check consistency of the test (2.7) under theassumption that min k ∈ [1 ,P n ] | ψ k σ | ≥
1. Define ρ n = ( ν/ γ / n / (cid:16) n / P / n (cid:17) . Lemma 1implies that N n = P n (1 + o (1)) for such a ρ n , which therefore satisfies ρ n = (1 + o (1)) ( ν/ (cid:0) γ n P / n /N n (cid:1) / /n / , so that (3.4) asymptotically holds provided ν ≥ κ ∗ and the test is consistent if 1 ≤ P n ≤ p n / a > δ a ( j ) ≤ C a νγ / n n / P / n | σψ j | for all j ∈ [1 , P n ] , δ a ( j ) = 0 for all j > P n . Hence the condition P n = O (cid:16) ( n/γ n ) / (cid:17) gives that δ a ( j ) ≤ Cj − − / since the | σψ j | arebounded away from infinity. Moreover Gaussianity ensures that (cid:107) u t,n − ε t (cid:107) a ≤ C a σ (cid:32) ν γ n nP / n P n (cid:88) k =1 ψ k (cid:33) / = O (cid:32) νγ / n P / n n / (cid:33) = o (1) , which gives Var ( u t,n ) = σ + o (1) and max j ∈ [1 ,n ] Var ( u t,n ) / Var ( u t,n u t + j,n ) = 1 + o (1) sothat Assumption R holds. This ends the proof of Proposition 1-(i). Consider now the other tests in Proposition 1-(ii). Define (cid:101) R ,j = (cid:80) n − jt =1 u t,n u t + j,n /n , (cid:101) R ,j = (cid:80) n − jt =1 ε t ε t + j /n , (cid:101) τ ,j = (cid:80) n − jt =1 u t,n u t + j,n / ( n − j ) − n (cid:101) R ,j / ( n − j ) and (cid:101) τ ,j = (cid:80) n − jt =1 ε t ε t + j / ( n − j ) − n (cid:101) R ,j / ( n − j ). Define also η t = η t,n = ν (cid:80) ∞ k =1 ψ k ε t − k , setting ψ k = 0 for k > P n , so that u t,n = ε t + γ / n η t / (cid:16) n / P / n (cid:17) . We have (cid:12)(cid:12)(cid:12) (cid:101) R j − (cid:101) R ,j (cid:12)(cid:12)(cid:12) ≤ γ / n n / P / n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − j (cid:88) t =1 η t ε t + j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + γ / n n / P / n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − j (cid:88) t =1 ε t η t + j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + γ n n P / n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − j (cid:88) t =1 η t η t + j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The Burkholder inequality gives, for any a > (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 η t ε t + j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C γ / n ( n − j ) / n / P / n (cid:107) η t (cid:107) a ≤ C γ / n P / n n , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 (cid:0) ε t η t + j − ψ j ε t (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 ε t (cid:32) j − (cid:88) k =0 ψ j ε t + j − k (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 (cid:32) ∞ (cid:88) k = j +1 ψ j ε t + j − k (cid:33) ε t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C γ / n P / n n , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 (cid:0) ε t − σ (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C γ / n nP / n , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ n n P / n n (cid:88) t =1 η t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ γ n nP / n ≤ C γ n P / n n , for all j . Note also that (cid:12)(cid:12)(cid:12)(cid:80) n − jt =1 η t η t + j (cid:12)(cid:12)(cid:12) ≤ (cid:80) nt =1 η t and the Markov inequality give for a largeenough, since γ n P / n = o ( n / )max j ∈ [1 ,n ] (cid:12)(cid:12)(cid:12) (cid:101) R ,j − (cid:101) R ,j (cid:12)(cid:12)(cid:12) a = O P (cid:18) max j ∈ [1 ,n ] (cid:12)(cid:12)(cid:12) (cid:101) R ,j − (cid:101) R ,j (cid:12)(cid:12)(cid:12) a (cid:19) = O P (cid:32) n (cid:88) j =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ / n n / P / n n − j (cid:88) t =1 η t ε t + j + n − j (cid:88) t =1 ε t η t + j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) aa + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ n n P / n n (cid:88) t =1 η t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) aa (cid:33) = O P (cid:32) n (cid:32) γ / n P / n n (cid:33) a + (cid:32) γ n P / n n (cid:33) a (cid:33) = o P (cid:18) n a/ − + 1 n a/ (cid:19) = o P (cid:32) n log n ) a/ (cid:33) . Hence max j ∈ [1 ,n ] (cid:12)(cid:12)(cid:12) (cid:101) R ,j − (cid:101) R ,j (cid:12)(cid:12)(cid:12) = o P (cid:32) n log n ) / (cid:33) . (A.9)Arguing similarly for the (cid:101) τ k,j give, since J n = O (cid:0) n / (cid:1) max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:101) τ ,j − (cid:101) τ ,j (cid:12)(cid:12) = o P (cid:32) n log n ) / (cid:33) , max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:101) τ ,j − σ (cid:12)(cid:12) = O P (cid:32) log / nn / (cid:33) , (A.10)where the latter is from Proposition A.1. Note that (A.9) and (A.10) gives (4.5). Let W k,n , CvM k,n , EL k,n be the statistic computed under G k , k = 0 ,
1, i.e. with (cid:101) R ,j / (cid:101) τ ,j and (cid:101) R ,j / (cid:101) τ ,j .Note that (A.9) and (A.10) gives W ,n = W ,n + o P (1). (4.5) and Proposition A.1 give | CvM ,n − CvM ,n | ≤ π J n (cid:88) j =1 n (cid:12)(cid:12)(cid:12)(cid:16) (cid:101) R ,j / (cid:101) τ ,j + (cid:101) R ,j / (cid:101) τ ,j (cid:17) (cid:16) (cid:101) R ,j / (cid:101) τ ,j − (cid:101) R ,j / (cid:101) τ ,j (cid:17)(cid:12)(cid:12)(cid:12) j ≤ j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12) n / (cid:101) R ,j (cid:12)(cid:12)(cid:12)(cid:101) τ ,j × max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n / (cid:32) (cid:101) R ,j (cid:101) τ ,j − (cid:101) R ,j (cid:101) τ ,j (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π J n (cid:88) j =1 j + max j ∈ [1 ,J n ] n (cid:32) (cid:101) R ,j (cid:101) τ ,j − (cid:101) R ,j (cid:101) τ ,j (cid:33) π J n (cid:88) j =1 j = n / O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) n / o P (cid:32) n log n ) / (cid:33) + no P (cid:18) n log n (cid:19) = o P (1) , Hence
CvM ,n = CvM ,n + o P (1). For EL n , W ,n = W ,n + o P (1) and Xiao and Wu (2011)gives that max j ∈ [1 ,J n ] (cid:12)(cid:12)(cid:12) (cid:101) R k,j / (cid:101) τ k,j (cid:12)(cid:12)(cid:12) ≤ (2 ln n ) / (1 + o P (1)) for k = 0 , P ( (cid:98) γ ∗ EL = ln n ) → G and G .We now show that P ( (cid:98) p ∗ EL = 1) → G . Propositions A.4 and A.5, (A.10) give P (cid:0)(cid:101) p ∗ ,EL (cid:54) = 1 (cid:1) = P (cid:32) max p ∈ [2 ,J n ] (cid:103) BP ∗ ,p − (cid:103) BP ∗ , p − > ln n (cid:33) + o (1)= P (cid:32) (1 + o P (1)) max p ∈ [2 ,J n ] n (cid:80) pj =2 (cid:101) R ,j /σ p − > ln n (cid:33) + o (1)= P (cid:32) n (cid:80) pj =2 (cid:101) R ,j /σ p − >
12 ln n for some p ∈ [2 , J n ] (cid:33) + o (1) ≤ J n (cid:88) p =2 P n (cid:80) pj =2 (cid:16) (cid:101) R ,j /σ − E (cid:104) (cid:101) R ,j /σ (cid:105)(cid:17) p − >
12 ln n − n (cid:80) pj =2 E (cid:104) (cid:101) R ,j /σ (cid:105) p − + o (1) ≤ J n (cid:88) p =2 Var (cid:18) n (cid:80) pj =2 ( (cid:101) R ,j /σ − E [ (cid:101) R ,j /σ ]) p − (cid:19)(cid:16) ln n − p − (cid:80) pj =2 (1 − j/n ) (cid:17) + o (1) ≤ C log n J n (cid:88) p =2 p − o (1) = O (cid:18) n (cid:19) + o (1) = o (1) . Now, observe that Proposition A.1 and (4.5) givemax p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:103) BP ∗ ,p − (cid:103) BP ∗ , p − − (cid:103) BP ∗ ,p − (cid:103) BP ∗ , p − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:80) pj =2 (cid:16) (cid:101) R ,j / (cid:101) τ ,j − (cid:101) R ,j / (cid:101) τ ,j (cid:17) p − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n / (cid:101) R ,j (cid:101) τ ,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) × max p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n / (cid:32) (cid:101) R ,j (cid:101) τ ,j − (cid:101) R ,j (cid:101) τ ,j (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:32) max p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n / (cid:32) (cid:101) R ,j (cid:101) τ ,j − (cid:101) R ,j (cid:101) τ ,j (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:33) = n / O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) n / o P (cid:32) n log n ) / (cid:33) + no P (cid:18) n log n (cid:19) = o P (1) . This, since arguing as in the bound above gives max p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:16)(cid:103) BP ∗ ,p − (cid:103) BP ∗ , (cid:17) / ( p − (cid:12)(cid:12)(cid:12) = O P (cid:16) log / n (cid:17) , implies that max p ∈ [2 ,J n ] (cid:12)(cid:12)(cid:12)(cid:16)(cid:103) BP ∗ ,p − (cid:103) BP ∗ , (cid:17) / ( p − (cid:12)(cid:12)(cid:12) ≤ log n with a probabilitytending to 1 and then P ( (cid:98) p ∗ EL = 1) → G . Hence (4.5) gives that EL ,n = (cid:103) BP ∗ , + o P (1) = (cid:103) BP ∗ , + o P (1) = EL ,n + o P (1), so that EL n converges in distribution to a Chisquare one with one degree of freedom under G and G . (cid:3) Supplementary Material B: Proofs of intermediary results
The proofs also use the notion of cumulants, see for example Brillinger (2001, p. 19) orXiao and Wu (2011) for a definition. LetCum (cid:0) u t ,n , . . . , u t q,n (cid:1) = Γ n ( t , . . . , t q )stands for the q th cumulants of { u t,n } . The next theorem on cumulant summability isTheorem 21 in Xiao and Wu (2011). These authors do not formally consider sequences { u t,n } but the following result is a straightforward extension of Xiao and Wu (2011). Theorem B.1 (Xiao and Wu (2011)) . Suppose { u t,n } is stationary for each n , with sup n (cid:107) u t,n (cid:107) q +1 < ∞ and sup n (cid:13)(cid:13) u t,n − u t − jt,n (cid:13)(cid:13) q ≤ δ q ( j ) where ∞ (cid:88) j =0 j q − δ q ( j ) < ∞ . Then there is a C which only depends on sup n (cid:107) u t,n (cid:107) q +1 and (cid:80) ∞ j =0 j q − δ q ( j ) such that ∞ (cid:88) t ,...,t q = −∞ | Γ n (0 , t , . . . , t q ) | ≤ C . In what follows, we drop subscript n in expressions like u t,n , R j,n , Γ n ( · ) and θ n when thereis no ambiguity. We denote K jp = K (cid:18) jp (cid:19) − K ( j ) and K n ( p ) = n − (cid:88) j =1 K jp . (B.1)B.1. Proof of Lemma A.2. (i) The first three bounds of the lemma follow directly fromAssumption K which implies that K ( j/p ) ≥ K ( j ) for all j and I ( x ∈ [0 , / /C ≤ K q ( x ) ≤ C I ( x ∈ [0 , C >
0. The Cauchy-Schwarz inequality implies that for any p ∈ [1 , n/ E ∆ ( p ) = (cid:80) n − j =1 (cid:0) − jn (cid:1) K jp ≤ K n ( p ) ≤ p / (cid:16)(cid:80) n − j =1 k j ( p ) (cid:17) / ≤ Cp / V ∆ ( p ),which is the last bound in (i). (ii) Write p = 1 + ν . Since p ≤ p n ≤ n/
2, the support of K ( · ) is [0 ,
1] and K ( · ) is a decreasing function, we have V ( p ) ≥ × p (cid:88) j =2 K (cid:18) jp (cid:19) ≥ ν (cid:88) j =1 K (cid:18) j ν (cid:19) ≥ ν (cid:88) j =1 (cid:90) j +1 j K (cid:18) x ν (cid:19) dx = (cid:90) ν +11 K (cid:18) x ν (cid:19) dx = ν (cid:90) K (cid:18) zν ν (cid:19) dz. The map ν (cid:55)−→ (2 + zν ) / (1 + ν ), z ∈ [0 , ν ≥ V ( p ) ≥ ν (cid:82) / K (cid:0) z (cid:1) dz ≥ C ( p − V (2) ≥ (cid:0) K (cid:0) (cid:1) − K (1) (cid:1) > V ∆ ( p ). Since K is nonincreasing, p (cid:55)−→ E ∆ ( p ) is non decreasing and E ∆ ( p ) ≥ p ∈ P . (cid:3) B.2.
Proof of Lemma A.3.
Under H , The proof repeats the steps of Lee (2007), Lobato(2001) and Kuan and Lee (2006) using the joint FCLT of Assumption M. The joint FCLTof Assumption M gives that the critical values are O P (1) under H . (cid:3) B.3.
Proof of Lemma A.4.
Equation (5.3.21) in Priestley (1981) and Theorem B.1 givesuniformly in j ,Var (cid:16) (cid:101) R j (cid:17) = 1 n n − j − (cid:88) j = − n + j +1 (cid:18) − | j | + jn (cid:19) (cid:0) R j + R j + j R j − j + Γ (0 , j , j, j + j ) (cid:1) ≤ n n (cid:88) j = − n R j + 1 n + ∞ (cid:88) j ,j ,j = −∞ | Γ (0 , j , j , j ) |≤ n ∞ (cid:88) j =0 R j + 1 n + ∞ (cid:88) j ,j ,j = −∞ | Γ (0 , j , j , j ) | < C. (cid:3) B.4.
Proof of Proposition A.1.
For the sake of brevity we assume that θ is unidimen-sional. That max j ∈ [0 ,n − (cid:12)(cid:12)(cid:12)(cid:12) (cid:101) R j − (cid:18) − jn (cid:19) R j,n (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) , max j ∈ [0 ,n − (cid:18) − jn (cid:19) (cid:12)(cid:12)(cid:101) τ j − τ j,n (cid:12)(cid:12) = O P (cid:32)(cid:18) log nn (cid:19) / (cid:33) , follow from Xiao and Wu (2011, Theorem 2). Note that these authors do not considerstationary sequences { u t,n } but their arguments carry over under Assumption R. Hence itsuffices to study max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − (cid:101) R j (cid:12)(cid:12)(cid:12) and max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:98) τ j − (cid:101) τ j (cid:12)(cid:12) since p n /n = o (cid:0) n − / (cid:1) underAssumption P. We then now show that max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − (cid:101) R j (cid:12)(cid:12)(cid:12) = O P (cid:0) n − / (cid:1) . Let e t = (cid:98) u t − u t ,so that (cid:98) R j = 1 n n − j (cid:88) t =1 ( u t + e t ) ( u t + j + e t + j ) = (cid:101) R j + 1 n n − j (cid:88) t =1 ( u t e t + j + e t u t + j ) + 1 n n − j (cid:88) t =1 e t e t + j with, by the Cauchy-Schwarz inequality, (cid:12)(cid:12)(cid:12)(cid:80) n − jt =1 e t e t + j (cid:12)(cid:12)(cid:12) /n ≤ (cid:80) nt =1 e t /n and, under Assump-tion M, for (cid:98) r t = r t (cid:16)(cid:98) θ (cid:17) ,1 n n − j (cid:88) t =1 u t e t + j = (cid:16)(cid:98) θ − θ (cid:17) n n − j (cid:88) t =1 u t u (1) t + j + 12 (cid:16)(cid:98) θ − θ (cid:17) n n − j (cid:88) t =1 u t u (2) t + j + 1 n n − j (cid:88) t =1 u t (cid:98) r t + j . Now, observe that Assumption M gives (cid:98) θ − θ = O P (cid:0) n − / (cid:1) , max t ∈ [1 ,n ] | (cid:98) r t | = o P (1 /n ) and1 n n (cid:88) t =1 e t ≤ (cid:16)(cid:98) θ − θ (cid:17) n n (cid:88) t =1 (cid:16) u (1) t (cid:17) + 34 (cid:16)(cid:98) θ − θ (cid:17) n n (cid:88) t =1 (cid:16) u (1) t (cid:17) + 3 n n (cid:88) t =1 | (cid:98) r t | = O P (cid:18) n (cid:19) , max j ∈ [1 ,n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 ( u t (cid:98) r t + j + u t + j (cid:98) r t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ t ∈ [1 ,n ] | (cid:98) r t | n n − j (cid:88) t =1 | u t | = o P (cid:18) n (cid:19) . This gives, uniformly in j ∈ [1 , n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − (cid:101) R j (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:98) θ − θ (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:98) θ − θ (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 (cid:16) u t u (1) t + j + u t + j u (1) t − E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O P (cid:18) n (cid:19) . (B.2)It also follows from Assumption M and p n = o (cid:0) n / (cid:1) that (cid:12)(cid:12)(cid:12)(cid:98) θ − θ (cid:12)(cid:12)(cid:12) max j ∈ [1 ,n ] (cid:12)(cid:12)(cid:12) E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:12)(cid:12)(cid:12) = O P (cid:0) /n / (cid:1) , n (cid:16)(cid:98) θ − θ (cid:17) (cid:80) ∞ j =0 E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105) = O P (1), and for A t ( j ) = u t u (1) t + j + u t + j u (1) t − E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:12)(cid:12)(cid:12)(cid:98) θ − θ (cid:12)(cid:12)(cid:12) max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 A t ( j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O P (cid:18) n / (cid:19) p n (cid:88) j =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 A t ( j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:18) n (cid:19) O P p n (cid:88) j =0 E / (cid:32) n / n − j (cid:88) t =1 A t ( j ) (cid:33) = O P (cid:18) n (cid:19) O P p n max j ∈ [0 ,p n ] (cid:32) n / n − j (cid:88) t =1 A t ( j ) (cid:33) = O P (cid:18) n / (cid:19) ,n n − (cid:88) j =0 (cid:16)(cid:98) θ − θ (cid:17) (cid:32) n n − j (cid:88) t =1 A t ( j ) (cid:33) = O P (1) 1 n O P n − (cid:88) j =0 E (cid:32) n / n − j (cid:88) t =1 A t ( j ) (cid:33) = O P (1) 1 n O P n max j ∈ [0 ,n ] E (cid:32) n / n − j (cid:88) t =1 A t ( j ) (cid:33) = O P (1) . This gives max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12) (cid:98) R j − (cid:101) R j (cid:12)(cid:12)(cid:12) = O P (cid:0) n − / (cid:1) and max p ∈ [0 ,n − n (cid:80) pj =1 (cid:16) (cid:98) R j − (cid:101) R j (cid:17) = O P (1).The study of max j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:98) τ j − (cid:101) τ j (cid:12)(cid:12) is similar. (cid:3) B.5.
Proof of Proposition A.2.
For the sake of brevity we assume that θ is unidimen-sional. Since (cid:98) R j − (cid:101) R j = (cid:16) (cid:98) R j − (cid:101) R j (cid:17) +2 (cid:101) R j (cid:16) (cid:98) R j − (cid:101) R j (cid:17) , Proposition A.2 is a direct consequenceof Proposition A.1 and Lemma B.1 below. Lemma B.1.
Assume that Assumptions K, M, P and R hold. Then max p ∈ [2 ,p n ] (cid:12)(cid:12)(cid:12) n (cid:80) n − j =1 ( K ( j/p ) − K ( j )) (cid:101) R j (cid:16) (cid:98) R j − (cid:101) R j (cid:17)(cid:12)(cid:12)(cid:12)(cid:16) n (cid:80) pj =1 R j (cid:17) / = O P (1) and n (cid:80) n − j =1 K ( j/p n ) (cid:101) R j (cid:16) (cid:98) R j − (cid:101) R j (cid:17) = O P (cid:18)(cid:16) n (cid:80) p n j =1 R j (cid:17) / (cid:19) for any p n = O ( n / ) . Proof of Lemma B.1.
We just prove the first equality since the proof of the second is verysimilar. Define R j = E (cid:104) (cid:101) R j (cid:105) = (1 − j/n ) R j . We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j =1 K jp (cid:101) R j (cid:16) (cid:98) R j − (cid:101) R j (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C n ( p ) + D n ( p ) , where C n ( p ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j =1 K jp R j (cid:16) (cid:98) R j − (cid:101) R j (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,D n ( p ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j =1 K jp (cid:16) (cid:101) R j − R j (cid:17) (cid:16) (cid:98) R j − (cid:101) R j (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The Cauchy-Schwarz inequality and Assumption K gives C n ( p ) ≤ C (cid:32) n p (cid:88) j =1 R j (cid:33) / (cid:32) n p (cid:88) j =1 (cid:16) (cid:98) R j − (cid:101) R j (cid:17) (cid:33) / . Hence Proposition A.1 yields that max p ∈ [2 ,p n ] | C n ( p ) / (cid:16) n (cid:80) pj =1 R j (cid:17) / | = O P (1). For D n ( p ),Assumptions K, M, (B.2) and (cid:98) r t = r t (cid:16)(cid:98) θ (cid:17) givemax p ∈ [2 ,p n ] D n ( p ) ≤ O P ( n − / ) (cid:18) max p ∈ [2 ,p n ] D n ( p ) + max p ∈ [2 ,p n ] D n ( p ) (cid:19) + O P ( n − ) max p ∈ [2 ,p n ] D n ( p )+ (cid:32) n n (cid:88) t =1 e t + 2 max t ∈ [1 ,n ] | r t | n n (cid:88) t =1 | u t | (cid:33) max p ∈ [2 ,p n ] D n ( p ) , where D n ( p ) = n (cid:80) pj =1 (cid:12)(cid:12)(cid:12) (cid:101) R j − R j (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:12)(cid:12)(cid:12) , D n ( p ) = n p (cid:88) j =1 (cid:12)(cid:12)(cid:12) (cid:101) R j − R j (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 (cid:16) u t u (1) t + j + u t + j u (1) t − E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,D n ( p ) = n p (cid:88) j =1 (cid:12)(cid:12)(cid:12) (cid:101) R j − R j (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − j (cid:88) t =1 (cid:16) u t u (2) t + j + u t + j u (2) t (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,D n ( p ) = n p (cid:88) j =1 (cid:12)(cid:12)(cid:12) (cid:101) R j − R j (cid:12)(cid:12)(cid:12) . By Assumption K and M and by Lemma A.4, we have E (cid:20) max p ∈ [2 ,p n ] D n ( p ) (cid:21) ≤ Cn p n (cid:88) j =1 Var / (cid:16) (cid:101) R j (cid:17) (cid:12)(cid:12)(cid:12) E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:12)(cid:12)(cid:12) ≤ Cn / , E (cid:20) max p ∈ [2 ,p n ] D n ( p ) (cid:21) ≤ Cn / p n (cid:88) j =1 Var / (cid:16) (cid:101) R j (cid:17) × E / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n / n (cid:88) t =1 (cid:16) u t u (1) t + j + u t + j u (1) t − E (cid:104) u t u (1) t + j + u t + j u (1) t (cid:105)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cp n , E (cid:20) max p ∈ [2 ,p n ] D n ( p ) (cid:21) ≤ Cn p n (cid:88) j =1 Var / (cid:16) (cid:101) R j (cid:17) E / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) t =1 (cid:16) u t u (2) t + j + u t + j u (2) t (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cp n n / , E (cid:20) max p ∈ [2 ,p n ] D n ( p ) (cid:21) ≤ Cn p n (cid:88) j =1 E (cid:104)(cid:12)(cid:12)(cid:12) (cid:101) R j − R j (cid:12)(cid:12)(cid:12)(cid:105) ≤ Cn p n (cid:88) j =1 Var / (cid:16) (cid:101) R j (cid:17) ≤ Cn / p n . The Markov inequality gives us the stochastic orders of magnitude of the four maxima in thebound for max p ∈ [2 ,p n ] D n ( p ). Since p n = O (cid:0) n / (cid:1) by Assumption P, max t ∈ [1 ,n ] | (cid:98) r t | = o P (1 /n )and n − (cid:80) nt =1 e t = O P ( n − ) by Assumption M, we have max p ∈ [2 ,p n ] | D n ( p ) | = O P (cid:16) p n n / (cid:17) = O P (1). This together with max p ∈ [2 ,p n ] | C n ( p ) / (cid:16) n (cid:80) pj =1 R j (cid:17) / | = O P (1) shows that theLemma is proved. (cid:3) B.6.
Proof of Proposition A.3.
The proof of Proposition A.3 is long and divided in threesteps. In the two first steps, we focus on observed variables. In the first step, we approximatethe sample covariance (cid:101) R j by a martingale counterpart (cid:80) nt =1 D jt /n , j ∈ [1 , p n ], as in Shao(2011b), see the notations below and Lemmas B.2, B.3. and B.4. The second step dealswith the deviation probability of n (cid:80) pj =1 (cid:16) n (cid:80) nt = j +1 D jt (cid:17) ( K ( j/p ) − K (1)) − σ E ∆ ( p ) σ V ∆ ( p ) which is approximated with some Gaussian counterparts through the Lindeberg technique,see Lemma B.5. The third step concludes and explicitly deals with the case of residualsthanks to Propositions A.1 and A.2.Let us now introduce additional notations. Let F k be the sigma field generated by e k , e k − , . . . .Define P t [ Z ] = E [ Z |F t ] − E [ Z |F t − ]. Wu (2007, Proposition 3) establishes that (cid:107) P t [ u t + k ] (cid:107) a ≤ δ a ( k ) and Shao (2011b) has shown that (cid:107) P [ u k u k − j ] (cid:107) a ≤ (cid:107) u k (cid:107) a ( δ a ( k ) + δ a ( k − j ) I ( j ≤ k )) , (B.3)which is smaller than 4 (cid:107) u k (cid:107) a δ a ( k − j ) when j ≤ k . Define now the vector of martingaledifference D t = (cid:2) D t , . . . , D p n t (cid:3) (cid:48) with D jt = ∞ (cid:88) k = t P t [ u k u k − j ]which converges a.s. and satisfies E [ D jt |F t − ] = 0, max j E [ | D jt | a ] < ∞ , provided (cid:107) u t (cid:107) a < ∞ and (cid:80) ∞ k =0 δ a ( k ) < ∞ . Consider the martingale M j = M jn = (cid:80) nt = j +1 D jt which is anapproximation of (cid:101) R j . Shao (Lemma A.1, 2011b) gives under Assumption R and for any a ∈ [1 , a ], (cid:32) E a (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t = j +1 u t u t − j − M j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a (cid:35)(cid:33) ≤ C. (B.4)We shall also use a p -dependent version of D t , denoted D t − p +1 t , with entries D t − p +1 jt = E [ D jt | e t , . . . , e t − p +1 ] = ∞ (cid:88) k = t P (cid:48) t [ u k u k − j ] , where (B.5) P (cid:48) t [ Z ] = P t − p +1 t [ Z ] = E [ Z | e t , . . . , e t − p +1 ] − E [ Z | e t − , . . . , e t − p +1 ] . Arguing as in Shao (2011b, Lemma A.2-(iii)) gives (cid:13)(cid:13) D jt − D t − p +1 jt (cid:13)(cid:13) a ≤ C (cid:107) u t (cid:107) a Θ a ( p − j ) , for all j ∈ [1 , p ] . (B.6) B.6.1.
Martingale approximation and preliminary lemmas.
An important property of D t and D t − p +1 t is as follows. Lemma B.2.
Suppose Assumption K and R hold. Let K jp be as in (B.1). Then for any p ≤ p , t , and any s ≤ t − p , (cid:13)(cid:13)(cid:13)(cid:80) pj =1 K jp D js D t − p +1 jt (cid:13)(cid:13)(cid:13) a ≤ Cp / . Proof of Lemma B.2 . We have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp D js D t − p +1 jt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp ∞ (cid:88) k =0 P s [ u s + k u s + k − j ] ∞ (cid:88) k =0 P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp j − (cid:88) k =0 P s [ u s + k u s + k − j ] j − (cid:88) k =0 P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (B.7)+ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp j − (cid:88) k =0 P s [ u s + k u s + k − j ] ∞ (cid:88) k = j P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (B.8)+ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp ∞ (cid:88) k = j P s [ u s + k u s + k − j ] j − (cid:88) k =0 P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (B.9)+ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp ∞ (cid:88) k = j P s [ u s + k u s + k − j ] ∞ (cid:88) k = j P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a . (B.10)We have for (B.7)(B.7) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp p − (cid:88) k =0 I ( k < j ) u s + k − j P s [ u s + k ] p − (cid:88) k =0 I ( k < j ) u t + k − j P (cid:48) t [ u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) k =0 p − (cid:88) k =0 (cid:32) p − (cid:88) j = k ∨ k K jp u s + k − j u t + k − j (cid:33) P s [ u s + k ] P (cid:48) t [ u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ p − (cid:88) k =0 p − (cid:88) k =0 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k ∨ k K jp u s + k − j u t + k − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a δ a ( k ) δ a ( k ) , using (cid:107) P (cid:48) t [ u t + k ] (cid:107) a ≤ (cid:107) P t [ u t + k ] (cid:107) a = δ a ( k ). Now (B.4) and the Burkholder inequalitygive (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k ∨ k K jp u s + k − j u t + k − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k ∨ k K jp D t + k − j,t − s + k − k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k ∨ k K jp ( u s + k − j u t + k − j − D t + k − j,t − s + k − k ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ Cp / . Hence (B.7) is smaller than Cp / . For (B.8), we have since { u s + k − j , j ∈ [1 , k ] } and { P (cid:48) t [ u t + k u t + k − j ] , j ∈ [1 , k ] , k ≥ } are independent,(B.8) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) k =0 ∞ (cid:88) k =0 (cid:32) p − (cid:88) j = k K jp u s + k − j P (cid:48) t [ u t + k + j u t + k ] (cid:33) P s [ u s + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ p − (cid:88) k =0 ∞ (cid:88) k =0 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k K jp u s + k − j P (cid:48) t [ u t + k + j u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a δ a ( k ) . Let d t = (cid:80) ∞ k = t P t [ u k ] be the martingale difference approximation of u t , see Wu (2007).Now, since { u s + k − j , d s + k − j, j ∈ [1 , k ] } and { P (cid:48) t [ u t + k u t + k − j ] , j ∈ [1 , k ] , k ≥ } are inde-pendent, arguing as in the proof of Theorem 1 in Wu (2007), (B.4) and the Burkholderinequality give (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k K jp u s + k − j P (cid:48) t [ u t + k + j u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k K jp d s + k − j P (cid:48) t [ u t + k + j u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + 2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k K jp ( u s + k − j − d t ) P (cid:48) t [ u t + k + j u t + k ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p − (cid:88) j = k K jp d s + k − j ( P (cid:48) t [ u t + k + j u t + k ]) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + C (cid:107) P (cid:48) t [ u t + k + j u t + k ] (cid:107) a ≤ Ck δ a ( k ) . Hence Assumption R gives (B.8) ≤ (cid:80) p − k =0 (cid:80) ∞ k =0 k δ a ( k ) δ a ( k ) ≤ C . For (B.9), observe first that (B.4) gives(B.9) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) k =0 p − (cid:88) k =0 p (cid:88) j =1 K jp I ( j ≤ k ) P s [ u s + k u s + k − j ] I ( k < j ) P (cid:48) t [ u t + k u t + k − j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ ∞ (cid:88) k =0 p − (cid:88) k =0 p (cid:88) j = k I ( j ≤ k ) δ a ( k − j ) (cid:107) P (cid:48) t [ u t + k u t + k − j ] (cid:107) a ≤ (cid:32) ∞ (cid:88) k =0 δ a ( k ) (cid:33) × p − (cid:88) k =0 p (cid:88) j = k (cid:107) P (cid:48) t [ u t + k u t + k − j ] (cid:107) a . Since u tt + k − j is independent of e t , . . . , e t − p +1 and P t [ u t + k ], (cid:107) P (cid:48) t [ u t + k u t + k − j ] (cid:107) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E (cid:2) u tt + k − j P t [ u t + k ] | e t , . . . , e t − p +1 (cid:3)(cid:124) (cid:123)(cid:122) (cid:125) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13) E (cid:2)(cid:0) u t + k − j − u tt + k − j (cid:1) P t [ u t + k ] | e t , . . . , e t − p +1 (cid:3)(cid:13)(cid:13) a ≤ (cid:13)(cid:13) u t + k − j − u tt + k − j (cid:13)(cid:13) a (cid:107) P t [ u t + k ] (cid:107) a ≤ Θ a ( k − j ) δ a ( k ) . (B.11)Substituting gives that (B.9) ≤ C (cid:80) p − k =0 (cid:80) pj = k Θ a ( k − j ) δ a ( k ) ≤ C .For (B.10), (B.3) and (B.11) give(B.10) ≤ C p (cid:88) j =1 (cid:32) ∞ (cid:88) k = j (cid:107) P s [ u s + k u s + k − j ] (cid:107) a (cid:33) ∞ (cid:88) k = j (cid:107) P (cid:48) t [ u t + k u t + k − j ] (cid:107) a ≤ C p (cid:88) j =1 (cid:32) ∞ (cid:88) k = j δ a ( k − j ) (cid:33) ∞ (cid:88) k = j Θ a ( k − j ) δ a ( k ) ≤ C. Hence substituting gives (cid:13)(cid:13)(cid:13)(cid:80) pj =1 K jp D js D t − p +1 jt (cid:13)(cid:13)(cid:13) a ≤ Cp / . (cid:3) We now define a suitable sequence of Gaussian vector. Let 2 p n ≤ (cid:96) ≤ p n be an integernumber. Consider a sequence of independent centered Gaussian vectors η t = (cid:2) η t , . . . , η p n t (cid:3) (cid:48) with E [ η j t η j t ] = E (cid:2) D t − (cid:96) +1 j t D t − (cid:96) +1 j t (cid:3) . (B.12)We shall also assume that { η t } and { e t } are independent. Lemma B.3.
Let { η t } be as in (B.12) and suppose Assumption R holds. Then for all p ∈ [1 , p n ] and t, s ∈ [1 , n ] , (cid:88) j (cid:54) = j ∈ [1 ,p n ] | Cov ( η j t , η j t ) | ≤ C and p n (cid:88) j =1 (cid:12)(cid:12) Var ( η jt ) − σ (cid:12)(cid:12) ≤ C, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:88) j =1 (cid:18) − jn (cid:19) K jp (cid:0) Var ( η jt ) − σ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) p (cid:88) j =1 (cid:18) − jn (cid:19) K jp Var ( η jt ) (cid:33) / − σ V ∆ ( p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C, Var (cid:32) p / p (cid:88) j =1 K jp D js η jt | D s (cid:33) ≤ Cp p (cid:88) j =1 K jp D js . Proof of Lemma B.3. (B.4) gives for all j , j ,Cov ( D j t , D j t ) = lim n →∞ Cov (cid:32) (cid:80) nt = j +1 u t u t − j ( n − j ) / , (cid:80) nt = j +1 u t u t − j ( n − j ) / (cid:33) = ∞ (cid:88) k = −∞ E [ u u j u k u k + j ] , see also Lemma A.2 in Shao (2011b), provided (cid:80) ∞ k = −∞ | E [ u u j u k u k + j ] | < ∞ as shownbelow. (B.6) and (B.12) givemax j ,j ∈ [0 ,p n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Cov ( η j t , η j t ) − ∞ (cid:88) k = −∞ E [ u u j u k u k + j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C Θ a ( p n ) . (B.13)Now relation between cumulants and moments in Brillinger (2001) and Theorem B.1 givesabsolute summability of the 4th moments. Hence Θ a ( p n ) = O ( p − n ) gives the first boundof the Lemma. For the second and the third bound, observe that under the null (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) k = −∞ E [ u u j u k u k + j ] − σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) E (cid:2) u u j (cid:3) − E (cid:2) u (cid:3) E (cid:2) u j (cid:3)(cid:12)(cid:12) + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) k =1 E [ u u j u k u k + j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (cid:12)(cid:12) E (cid:2) u u j (cid:3) − E [ u ] E (cid:2) u j (cid:3)(cid:12)(cid:12) ≤ C Θ a ( j ) = O ( j − ) and absolute summability of the 4th mo-ments gives the second bound. This also gives the fourth one since (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) p (cid:88) j =1 (cid:18) − jn (cid:19) K jp Var ( η jt ) (cid:33) / − σ V ∆ ( p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:32) p (cid:88) j =1 (cid:18) − jn (cid:19) K jp (cid:0) Var ( η jt ) − σ (cid:1) (cid:33) / ≤ / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:88) j =1 (cid:18) − jn (cid:19) K jp (cid:0) Var ( η jt ) − σ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C. For the last one, observe first that (cid:88) ≤ j The deviation probability of the maximum of Proposition A.3. The proof is based on asmooth approximation of the maximum of real numbers x , . . . , x p n . Consider an increasingand three times continuously differentiable real function f withlim x →−∞ f ( x ) = 1 , f ( x ) = x for x ≥ , max i =1 , , sup x (cid:12)(cid:12) f ( i ) ( x ) (cid:12)(cid:12) < ∞ . (B.14) Let e = e n → ∞ with ln ( p n ) /e = o (1). Then max p ∈ [1 ,p n ] { f ( x p ) } ≤ (cid:16)(cid:80) p n p =1 f e ( x p ) (cid:17) /e ≤ p /en max p ∈ [1 ,p n ] { f ( x p ) } gives that (cid:32) p n (cid:88) p =1 f e ( x p ) (cid:33) /e = (cid:18) O (cid:18) ln p n e (cid:19)(cid:19) max p ∈ [1 ,p n ] { f ( x p ) } . (B.15)We will first find a suitable approximation for the distribution of M = (cid:32) p n (cid:88) p =1 f e (ˇ s p ) (cid:33) /e where ˇ S p = n p (cid:88) j =1 K jp (cid:18) M jn n (cid:19) , ˇ s p = ˇ S p − σ E ∆ ( p ) σ V ∆ ( p ) . (B.16)Define, for η = (cid:2) η , . . . , η p n (cid:3) (cid:48) and x ∈ [0 , M jt ( x ; η ) = t − (cid:88) s = j +1 D js + xη j + n (cid:88) s = t +1 η js , R jt ( x ; η ) = M jt ( x ; η ) n ˇ s pt ( x ; η ) = n (cid:80) pj =1 K jp R jt ( x ; η ) − σ E ∆ ( p ) σ V ∆ ( p ) , Σ t ( x ; η ) = f (ˇ s pt ( x ; η )) , M t ( x ; η ) = (cid:32) p n (cid:88) p =1 Σ et ( x ; η ) (cid:33) e , M t ( η ) = M t (1; η ) , (B.17)and ˇ s (1) pt ( x ; η ) = d ˇ s pt ( x ; η ) dx = 2 (cid:80) pj =1 K jp (cid:16)(cid:80) t − s = j +1 D js + xη j + (cid:80) ns = t +1 η js (cid:17) η j nσ V ∆ ( p ) , ˇ s (2) pt ( x ; η ) = d pt ˇ s ( x ; η ) dx = 2 (cid:80) pj =1 K jp η j nσ V ∆ ( p ) , Σ (1) pt ( x ; η ) = f (1) (ˇ s pt ( x ; η )) ˇ s (1) pt ( x ; η ) , Σ (2) pt ( x ; η ) = f (2) (ˇ s pt ( x ; η )) (cid:16) ˇ s (1) pt ( x ; η ) (cid:17) + f (1) (ˇ s pt ( x ; η )) ˇ s (2) pt ( x ; η ) , Σ (3) pt ( x ; η ) = f (3) (ˇ s pt ( x ; η )) (cid:16) ˇ s (1) pt ( x ; η ) (cid:17) + 3 f (2) (ˇ s pt ( x ; η )) ˇ s (1) pt ( x ; η ) ˇ s (2) pt ( x ; η ) . We first bound the moments of Σ (1) pt ( x ; η ), Σ (2) pt ( x ; η ) and Σ (3) pt ( x ; η ) when η is set to D t or η t . Lemma B.4. Under Assumption R and if p n = O (cid:0) n / (cid:1) , we have uniformly in p ∈ [1 , p n ] , x ∈ [0 , and t = 1 , . . . , n , max (cid:110)(cid:13)(cid:13)(cid:13) Σ (1) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a , (cid:13)(cid:13)(cid:13) Σ (1) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a (cid:111) ≤ Cn / , (B.18)max (cid:26)(cid:13)(cid:13)(cid:13) Σ (2) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a/ , (cid:13)(cid:13)(cid:13) Σ (2) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a/ (cid:27) ≤ Cp / n , (B.19)max (cid:110)(cid:13)(cid:13)(cid:13) Σ (3) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a , (cid:13)(cid:13)(cid:13) Σ (3) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a (cid:111) ≤ Cp / n / . (B.20) Proof of Lemma B.4. (B.14) gives (cid:12)(cid:12)(cid:12) Σ (1) pt ( x ; η ) (cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) ˇ s (1) pt ( x ; η ) (cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12) Σ (2) pt ( x ; η ) (cid:12)(cid:12)(cid:12) ≤ C (cid:18)(cid:16) ˇ s (1) pt ( x ; η ) (cid:17) + (cid:12)(cid:12)(cid:12) ˇ s (2) pt ( x ; η ) (cid:12)(cid:12)(cid:12)(cid:19) , (cid:12)(cid:12)(cid:12) Σ (3) pt ( x ; η ) (cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) ˇ s (1) pt ( x ; η ) (cid:12)(cid:12)(cid:12) (cid:18)(cid:16) ˇ s (1) pt ( x ; η ) (cid:17) + (cid:12)(cid:12)(cid:12) ˇ s (2) pt ( x ; η ) (cid:12)(cid:12)(cid:12)(cid:19) . (B.21)(B.21) shows that the lemma directly follows frommax (cid:110)(cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a , (cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a (cid:111) ≤ Cn / , (B.22)max (cid:26)(cid:13)(cid:13)(cid:13) ˇ s (2) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a/ , (cid:13)(cid:13)(cid:13) ˇ s (2) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a/ (cid:27) ≤ Cp / n . (B.23)(B.23) directly follow from the triangular inequality. For (B.22), we first bound (cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a .We have (cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a ≤ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) t − s =1 (cid:16)(cid:80) pj =1 K jp D js D jt (cid:17) np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (B.24)+ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 K jp D jt np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) ns = t +1 (cid:16)(cid:80) pj =1 K jp D jt η js (cid:17) np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a . (B.25) We have, for the first item (B.24)(B.24) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 D jt (cid:80) t − p s =1 K jp D js np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) t − s = t − p + D jt (cid:80) pj =1 K jp D js np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 D jt (cid:80) t − p s =1 K jp D js np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + 1 np / p (cid:88) j =1 (cid:107) K jp D jt (cid:107) a (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − (cid:88) s = t − p + D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) t − p s =1 K jp (cid:80) pj =1 D jt D js np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + Cp / p / n , where p ≥ p and by the Burkholder inequality. Now let (cid:101) D jt = D t − p +1 jt be as in (B.5). Since (cid:80) pj =1 K jp D js (cid:101) D jt is a martingale difference given e t , . . . , e t − p +1 , (B.6), the Burkholder andtriangular inequalities, Lemma B.2 give (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 (cid:80) t − p s =1 K jp D js D jt np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) t − p s =1 (cid:80) pj =1 K jp D js (cid:101) D jt np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + 1 np / p (cid:88) j =1 | K jp | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − p (cid:88) s =1 D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (cid:13)(cid:13)(cid:13) D jt − (cid:101) D jt (cid:13)(cid:13)(cid:13) a ≤ Cnp / t − p (cid:88) s =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp D js (cid:101) D jt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a / + C Θ a ( p − p ) p / ≤ Cnp / ( | t − p | p ) / + C Θ a ( p − p ) p / ≤ C (cid:18) n / + Θ a ( p − p ) p / (cid:19) . Hence substituting gives (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) t − s =1 (cid:16)(cid:80) pj =1 K jp D js D jt (cid:17) np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C (cid:18) n / + p / p / n + Θ a ( p − p ) p / (cid:19) . (B.26) For the first item in (B.25), (B.23) gives a bound C/n / . For the second item in (B.25),conditional Gaussianity of the (cid:110)(cid:80) pj =1 K jp D jt η js (cid:111) and Lemma B.3 give (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) ns = t +1 (cid:16)(cid:80) pj =1 K jp D jt η js (cid:17) np / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ Cnp / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:40) n (cid:88) s = t +1 (cid:32) p (cid:88) j =1 K jp D jt (cid:33)(cid:41) / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ Cnp / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) s = t +1 (cid:32) p (cid:88) j =1 K jp D jt (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) / a/ ≤ Cnp / (cid:32) n (cid:88) s = t +1 p (cid:88) j =1 K jp (cid:107) D jt (cid:107) a (cid:33) / ≤ Cnp / (( n − t ) p ) / ≤ Cn / . Substituting the two last bounds in (B.25) and (B.26) in (B.24) shows thatmax (cid:110)(cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; D t ) (cid:13)(cid:13)(cid:13) a , (cid:13)(cid:13)(cid:13) ˇ s (1) pt ( x ; η t ) (cid:13)(cid:13)(cid:13) a (cid:111) ≤ C (cid:18) n / + p / p / n + Θ a ( p − p ) p / (cid:19) . (B.27)Observe that Θ a ( p − p ) ≤ C ( p − p ) − / by Assumption R. Consider now p = max (cid:32) p, (cid:18) np (cid:19) (cid:33) ≥ p, which is such that, since p ∈ [1 , p n ] with p n = O (cid:0) n / (cid:1) ,If (cid:18) np (cid:19) ≥ p , ( p − p ) − / p / (cid:16) p / p / n ≤ p n ≤ n / ≤ n / , If (cid:18) np (cid:19) < p ⇔ (cid:16) n (cid:17) < p , Θ a ( p − p ) p / ≤ Cp − ≤ Cn / , p / p / n ≤ p n n ≤ Cn / . Hence (B.27) gives (B.22). (cid:3) Let I ( · ) be a three times differentiable real function and define for M t ( η ) as in (B.17), I t ( η ) = I tn ( η ) = I ( M t ( η )) , I t ( x ; η ) = I ( xη ) , I ( j ) t ( x ; η ) = d jt I ( x ; η ) d j x , j = 1 , . Observe that I ( M ) = I ( M n ( D n )) = I n ( D n ), I t ( D t ) = I t +1 ( η t +1 ), and that I ( M ( η ))= I ( η ) is a function of the Gaussian vectors η , . . . , η n only. Lemma B.5. Let M and M ( η ) be as in (B.16) and (B.17). Consider a real function I ( · ) which may depend on n and three times continuously differentiable with max j =1 , , sup x (cid:12)(cid:12) I ( j ) ( x ) (cid:12)(cid:12) ≤ C . Then under Assumptions P, R and if e = O (cid:0) p / (2 a ) n (cid:1) , | E [ I ( M ) − I ( M ( η ))] | ≤ C (cid:18) p /an n / + 1 p − /an (cid:19) . Proof of Lemma B.5. The proof of the Lemma works by changing D n into η n , D n − into η n − and so on, the so called Lindeberg technique described in Pollard (2002, p.179). Thisamounts to decompose I ( M ) − I ( M n ( η n )) into the following sum of differences, I ( M ) − I ( M n ( η n ))= I n ( D n ) − I n − ( D n − ) + I n − ( D n − ) − I n − ( D n − ) + · · · + I ( D ) − I ( η )= I n ( D n ) − I n ( η n ) + I n − ( D n − ) − I n − ( η n − ) + · · · + I ( D ) − I ( η ) . Since I t ( η ) = I t (1; η ) and I t (0; η ) = I t (0), a third-order Taylor expansion around η = 0 withintegral remainder gives[ I t ( D t ) − I t ( η t )] = E (cid:104) I (1) t (0; D t ) − I (1) t (0; η t ) (cid:105) + 12 E (cid:104) I (2) t (0; D t ) − I (2) t (0; η t ) (cid:105) + 12 (cid:90) (1 − x ) E (cid:104) I (3) t ( x ; D t ) − I (3) t ( x ; η t ) (cid:105) dx. Since { D t } is a sequence of martingale difference, E (cid:104) I (1) t (0; D t ) − I (1) t (0; η t ) (cid:105) = 0 due to theexpression of I (1) t (0; η ) given above. Hence | E [ I ( M )] − E [ I ( M ( η ))] | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t =1 E (cid:104) I (2) t (0; D t ) − I (2) t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (B.28)+ 12 (cid:90) (1 − x ) (cid:40) n (cid:88) t =1 (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ; D t ) − I (3) t ( x ; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:41) dx. (B.29) We now compute the differentials I ( j ) t ( x ; η ), j = 1 , , 3. We have I (1) t ( x ; η ) = I (cid:48) ( M t ( x ; η )) M (1) t ( x ; η ) , I (2) t ( x ; η ) = I (cid:48)(cid:48) ( M t ( x ; η )) (cid:16) M (1) t ( x ; η ) (cid:17) + I (cid:48) ( M t ( x ; η )) M (2) t ( x ; η ) , I (3) t ( x ; η ) = I (cid:48)(cid:48)(cid:48) ( M t ( x ; η )) (cid:16) M (1) t ( x ; η ) (cid:17) + 3 I (cid:48)(cid:48) ( M t ( x ; η )) M (1) t ( x ; η ) M (2) t ( x ; η )+ I (cid:48) ( M t ( x ; η )) M (3) t ( x ; η ) . We compute the differentials of M t . We have M (1) t ( x ; η ) = (cid:32) p n (cid:88) p =1 Σ ept ( x ; η ) (cid:33) /e − p n (cid:88) p =1 Σ e − pt ( x ; η ) Σ (1) pt ( x ; η )= M − et ( x ; η ) p n (cid:88) p =1 Σ e − pt ( x ; η ) Σ (1) pt ( x ; η ) , M (2) t ( x ; η ) = M (2)1 t ( x ; η ) + M (2)2 t ( x ; η ) + M (2)3 t ( x ; η ) , M (3) ( x ; η ) = M (3)1 t ( x ; η ) + · · · + M (3)6 t ( x ; η ) , where, dropping the variables x , η for notational convenience M (2)1 t = (cid:18) e − (cid:19) M − et (cid:32) p n (cid:88) p =1 Σ e − pt Σ (1) pt (cid:33) , M (2)2 t = M − et p n (cid:88) p =1 Σ e − pt Σ (2) pt , M (2)3 t = ( e − M − et p n (cid:88) p =1 Σ e − pt (cid:16) Σ (1) pt (cid:17) , M (3)1 t = (cid:18) e − (cid:19) (cid:18) e − (cid:19) M − et (cid:32) p n (cid:88) p =1 Σ e − pt Σ (1) pt (cid:33) , M (3)2 t = 3 (cid:18) e − (cid:19) M − et p n (cid:88) p =1 Σ e − pt Σ (1) pt p n (cid:88) p =1 Σ e − pt Σ (2) pt , M (3)3 t = 3 (cid:18) e − (cid:19) ( e − M − et p n (cid:88) p =1 Σ e − pt Σ (1) pt p n (cid:88) p =1 Σ e − pt (cid:16) Σ (1) pt (cid:17) , M (3)4 t = (3 e − M − et p n (cid:88) p =1 Σ e − pt Σ (2) pt Σ (1) pt , M (3)5 t = ( e − 1) ( e − M − et p n (cid:88) p =1 Σ e − pt (cid:16) Σ (1) pt (cid:17) , M (3)6 t = M − et p n (cid:88) p =1 Σ e − pt Σ (3) pt . The third-order item(B.29) . Since12 (cid:90) (1 − x ) (cid:40) n (cid:88) t =1 (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ; D t ) − I (3) t ( x ; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:41) dx ≤ (cid:90) (1 − x ) (cid:40) n (cid:88) t =1 (cid:16)(cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ; D t ) (cid:105)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:17)(cid:41) dx, it is sufficient to bound (cid:80) nt =1 (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ) (cid:105)(cid:12)(cid:12)(cid:12) independently of x where I (3) t ( x ) stands for I (3) t ( x ; η t ) or I (3) t ( x ; D t ). We have, dropping dependence w.r.t. to x for ease of notation, n (cid:88) t =1 (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t (cid:105)(cid:12)(cid:12)(cid:12) ≤ C n (cid:88) t =1 (cid:26) E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) + E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)1 t (cid:12)(cid:12)(cid:12)(cid:105) + E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)2 t (cid:12)(cid:12)(cid:12)(cid:105)(cid:27) + C n (cid:88) t =1 (cid:40) E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)3 t (cid:12)(cid:12)(cid:12)(cid:105) + (cid:88) j =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3) jt (cid:12)(cid:12)(cid:12)(cid:105)(cid:41) . We now study the ten items above. (1) (cid:80) nt =1 E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) . We have for a , a ≥ /a = 1 − /a , E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M − et p n (cid:88) p =1 Σ e − pt Σ (1) pt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p n (cid:88) p ,p ,p =1 E (cid:104)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t Σ e − p t Σ (1) p t Σ (1) p t Σ (1) p t (cid:12)(cid:12)(cid:12)(cid:105) ≤ max p,t (cid:13)(cid:13)(cid:13) Σ (1) pt (cid:13)(cid:13)(cid:13) a p n (cid:88) p ,p ,p =1 E /a (cid:20)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t Σ e − p t (cid:12)(cid:12)(cid:12) a (cid:21) ≤ Cn / p n (cid:88) p ,p ,p =1 E /a (cid:20)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t Σ e − p t (cid:12)(cid:12)(cid:12) a (cid:21) , by (B.18) for all x ∈ [0 , t (cid:55)→ t /a , t (cid:55)→ t − /e are concave and (cid:80) p n p =1 t ap ≤ (cid:16)(cid:80) p n p =1 t p (cid:17) a , the definition of M t gives p n (cid:88) p ,p ,p =1 E /a (cid:20)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t Σ e − p t (cid:12)(cid:12)(cid:12) a (cid:21) = p n × p n p n (cid:88) p ,p ,p =1 E /a (cid:20)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t Σ e − p t (cid:12)(cid:12)(cid:12) a (cid:21) ≤ p n (cid:32) p n E (cid:34) p n (cid:88) p ,p ,p =1 M a (1 − e ) t Σ ae (1 − /e ) p t Σ ae (1 − /e ) p t Σ ae (1 − /e ) p t (cid:35)(cid:33) /a = p n E (cid:32) p n (cid:88) p =1 Σ ept (cid:33) − a (1 − /e ) (cid:32) p n p n (cid:88) p =1 Σ ae (1 − /e ) pt (cid:33) /a ≤ p n E (cid:32) p n (cid:88) p =1 Σ aept (cid:33) − − /e ) (cid:32) p n p n (cid:88) p =1 Σ aept (cid:33) − /e ) /a ≤ p − /a )+3 / ( ea ) n ≤ Cp /an , uniformly w.r.t. to t since (ln p n ) /e = o (1). Hence for all x ∈ [0 , n (cid:88) t =1 E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) ≤ C p /an n / . (B.30) (2) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)1 t (cid:12)(cid:12)(cid:12)(cid:105) . We have, since M t ≥ E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) M (2)1 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ C E M − et (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p n (cid:88) p =2 Σ e − pt Σ (1) pt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C E M − et (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p n (cid:88) p =2 Σ e − pt Σ (1) pt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) , for all t , such that (cid:80) nt =1 E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) M (2)1 t (cid:12)(cid:12)(cid:12)(cid:21) ≤ C (cid:80) nt =1 E (cid:20)(cid:12)(cid:12)(cid:12) M (1) t (cid:12)(cid:12)(cid:12) (cid:21) . Hence a bound similarto (B.30) holds. (3) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)2 t (cid:12)(cid:12)(cid:12)(cid:105) . Let a > /a = 1 − /a . Arguing as for (1) with(B.18) and (B.19), E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)1 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ C p n (cid:88) p ,p =1 E (cid:104) M − e ) t (cid:12)(cid:12)(cid:12) Σ e − p t Σ e − p t Σ (1) p t Σ (2) p t (cid:12)(cid:12)(cid:12)(cid:105) ≤ C max p,t (cid:26)(cid:13)(cid:13)(cid:13) Σ (1) pt (cid:13)(cid:13)(cid:13) a (cid:13)(cid:13)(cid:13) Σ (2) pt (cid:13)(cid:13)(cid:13) a/ (cid:27) p n (cid:88) p ,p =1 E /a (cid:20)(cid:12)(cid:12)(cid:12) M − e ) t Σ e − p t Σ e − p t (cid:12)(cid:12)(cid:12) a (cid:21) ≤ C p / n n / × p n × E /a (cid:32) p n (cid:88) p =1 Σ ept (cid:33) − a (1 − /e ) (cid:32) p n p n (cid:88) p =1 Σ ea (1 − /e ) pt (cid:33) ≤ C p / n n / × p n × E /a (cid:32) p n (cid:88) p =1 Σ ea (1 − /e ) pt (cid:33) − (cid:32) p n p n (cid:88) p =1 Σ ea (1 − /e ) pt (cid:33) = C p / n n / × p n × p − /an = C p (1+4 /a ) n n / . Hence, uniformly w.r.t. x ∈ [0 , n (cid:88) t =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)2 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ C p (1+4 /a ) n n / . (B.31) (4) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)3 t (cid:12)(cid:12)(cid:12)(cid:105) . Proceeding as (1) and (3) gives, since inf p,t Σ pt ≥ E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)3 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ Ce p n (cid:88) p ,p =1 E (cid:20) M − e ) t (cid:12)(cid:12)(cid:12)(cid:12) Σ e − p t Σ e − p t Σ (1) p t (cid:16) Σ (1) p t (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ C ep /an n / ≤ C p /an n / , provided e = O ( p /an ). Hence (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (1) t M (2)3 t (cid:12)(cid:12)(cid:12)(cid:105) can be bounded as in (B.30). (5) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)1 t (cid:12)(cid:12)(cid:12)(cid:105) can be bounded as in (B.30) since M t ≥ E (cid:104)(cid:12)(cid:12)(cid:12) M (3)1 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ C E (cid:20) M − e ) t (cid:12)(cid:12)(cid:12)(cid:80) p n p =2 Σ e − pt Σ (1) pt (cid:12)(cid:12)(cid:12) (cid:21) . (6) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)2 t (cid:12)(cid:12)(cid:12)(cid:105) . Arguing as in (3) gives that (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)2 t (cid:12)(cid:12)(cid:12)(cid:105) can be bounded as in(B.31). (7) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)3 t (cid:12)(cid:12)(cid:12)(cid:105) . Arguing as in (4) shows that this item is negligible compared to(B.30). (8) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)4 t (cid:12)(cid:12)(cid:12)(cid:105) . Let a > /a = 1 − /a . We have, since inf p,t Σ pt ≥ E (cid:104)(cid:12)(cid:12)(cid:12) M (3)4 t (cid:12)(cid:12)(cid:12)(cid:105) ≤ Ce E (cid:34) M − et p n (cid:88) p =1 (cid:12)(cid:12)(cid:12) Σ e − pt Σ (2) pt Σ (1) pt (cid:12)(cid:12)(cid:12)(cid:35) ≤ Ce p n (cid:88) p = p o E /a (cid:104)(cid:0) M − et Σ e − pt (cid:1) a (cid:105) (cid:13)(cid:13)(cid:13) Σ (2) pt (cid:13)(cid:13)(cid:13) a/ (cid:13)(cid:13)(cid:13) Σ (1) pt (cid:13)(cid:13)(cid:13) a ≤ C ep / n p − /an n / ≤ C p (1+4 /a ) n n / , provided e = O (cid:0) p /an (cid:1) . This gives a bound similar to (B.31) for (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)4 t (cid:12)(cid:12)(cid:12)(cid:105) . (9) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)5 t (cid:12)(cid:12)(cid:12)(cid:105) can be bounded as in (B.30) provided e = O ( p / (2 a ) n ). (10) (cid:80) nt =1 E (cid:104)(cid:12)(cid:12)(cid:12) M (3)6 t (cid:12)(cid:12)(cid:12)(cid:105) can be bounded as in (B.31).Hence, collecting the dominant bounds (B.30) and (B.31) in (1) - (10) gives12 (cid:90) (1 − x ) (cid:40) n (cid:88) t =1 (cid:12)(cid:12)(cid:12) E (cid:104) I (3) t ( x ; D t ) − I (3) t ( x ; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:41) dx ≤ C p a n + p (1+4 /a ) n n / ≤ C (cid:32) p a n n (cid:33) . (B.32) The second-order term (B.28) . Note that I (2) t (0; η ) = η (cid:48) A t η where A t depends upon D , . . . , D t − and η t +1 , . . . , η n . In the standard Lindeberg method, { D t , t ∈ [1 , n ] } and { η t , t ∈ [1 , n ] } are both independent variables with identical mean and variance, so that the second orderterm, which writes as a sum of items E [ D (cid:48) t A t D t ] − E [ η (cid:48) t A t η t ], is equal to 0 in this simplercase. However this does not hold in our case. In this step, the second order term is dealtwith by removing from I (2) t (0; η ) a block (cid:80) pj =1 K jp (cid:80) t − s = t − (cid:96) D js and by changing the D jt into D t − (cid:96) +1 jt = E [ D jt | e t , . . . , e t − (cid:96) +1 ].Observe that I (2) t (0; η ) = I (2)1 t (0; η ) + I (2)2 t (0; η ) + I (2)3 t (0; η ) + I (2)4 t (0; η ) with, dropping thedependence upon 0 and η , I (2)1 t = (cid:18) e − (cid:19) I (1) tn M − et (cid:32) p n (cid:88) p =1 Σ e − pt Σ (1) pt (cid:33) , I (1) tn = I (cid:48) ( M t ) , I (2)2 t = I (1) tn M − et p n (cid:88) p =1 Σ e − pt Σ (2) pt , I (2)3 t = ( e − I (1) tn M − et p n (cid:88) p =1 Σ e − pt (cid:16) Σ (1) pt (cid:17) , I (2)4 t = I (cid:48)(cid:48) ( M t ) (cid:32) M − et p n (cid:88) p =1 Σ e − pt Σ (1) pt (cid:33) . Observe M t (0; D t ) = M t (0; η t ) and Σ pt (0; D t ) = Σ pt (0; η t ) and that these quantities donot depend upon η t or D t . We shall first focus on I (2)1 t . Let (cid:96) ≥ p n be an integer number.Define, for y ∈ [0 , S pt ( y ; η ) = 2 (cid:80) pj =1 K jp (cid:16)(cid:80) t − (cid:96) − s = j +1 D js + y (cid:80) t − s = t − (cid:96) D js + (cid:80) ns = t +1 η js (cid:17) η j nσ V ∆ ( p ) , S pt ( y ) = S pt (cid:0) y ; yD t + (1 − y ) D t − (cid:96) +1 t (cid:1) , T pt ( y ; η ) = ˇ s (2) pt ( y ; η ) = 2 (cid:80) pj =1 K jp η j nσ V ∆ ( p ) , T pt ( y ) = T pt (cid:0) y ; yD t + (1 − y ) D t − (cid:96) +1 t (cid:1) , which are such that S pt (1; η ) = ˇ s (1) pt (0; η ), S pt (1) = ˇ s (1) pt (0; D t ), T pt (1) = ˇ s (2) pt (0; D t ). Definealso M jt ( y ) = t − (cid:96) − (cid:88) s = j +1 D js + y t − (cid:88) s = t − (cid:96) D js + n (cid:88) s = t +1 η js , R jt ( y ) = M jt ( y ) n , s pt ( y ) = n (cid:80) pj =1 K jp R jt ( y ) − σ E ∆ ( p ) σ V ∆ ( p ) , Σ pt ( y ) = f ( s pt ( y )) , (cid:101) Σ (1) pt ( y ; η ) = f (1) ( s pt ( y )) S pt ( y ; η ) , (cid:101) Σ (2) pt ( y ; η ) = f (1) ( s pt ( y )) T pt ( y ; η ) + f (2) ( s pt ( y )) ( S pt ( y ; η )) , (cid:101) Σ (1) pt ( y ) = (cid:101) Σ (1) pt (cid:0) y ; yD t + (1 − y ) D t − (cid:96) +1 t (cid:1) , (cid:101) Σ (2) pt ( y ; η ) = (cid:101) Σ (2) pt (cid:0) y ; yD t + (1 − y ) D t − (cid:96) +1 t (cid:1) , M t ( y ) = (cid:32) p n (cid:88) p =1 Σ ept ( y ) (cid:33) e , I (1) tn ( y ) = I (cid:48) ( M t ( y )) , and the counterpart of I (2)1 t (0; η t ) and I (2)1 t (0; D t ) as I t ( y ; η ) = (cid:18) e − (cid:19) I (1) tn ( y ) M − et ( y ) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1) pt ( y ; η ) (cid:33) , I t ( y ) = I t (cid:0) y ; yD t + (1 − y ) D t − (cid:96) +1 t (cid:1) . Observe that I (2)1 t (0; η t ) = I t (1; η t ) and I (2)1 t (0; D t ) = I t (1). Hence E (cid:104) I (2)1 t (0; D t ) − I (2)1 t (0; η t ) (cid:105) = E [ I t (1) − I t (1; η t )] and E (cid:104) I (2)1 t (0; D t ) − I (2)1 t (0; η t ) (cid:105) = E [ I t (0) − I t (0; η t )] (B.33)+ (cid:90) E (cid:104) I (1) t ( y ) − I (1) t ( y ; η t ) (cid:105) dy, (B.34)where I (1) t ( y ) = d I t ( y ) /dy and I (1) t ( y ; η t ) = d I t ( y ; η t ) /dy . We first consider the integral item (cid:82) (cid:12)(cid:12)(cid:12) E (cid:104) I (1) t ( y ) (cid:105)(cid:12)(cid:12)(cid:12) dy from (B.34) and first compute I (1)1 t ( y ).Define S (1) pt ( y ) = d S pt ( y ) dy = 2 (cid:80) pj =1 K jp (cid:0)(cid:80) t − s = t − (cid:96) D js (cid:1) (cid:0) yD jt + (1 − y ) D t − (cid:96) +1 jt (cid:1) nσ V ∆ ( p )+ 2 (cid:80) pj =1 K jp (cid:16)(cid:80) t − (cid:96) − s = j +1 D js + y (cid:80) t − s = t − (cid:96) D js + (cid:80) ns = t +1 η js (cid:17) (cid:0) D t − (cid:96) +1 jt − D jt (cid:1) nσ V ∆ ( p ) , T (1) pt ( y ) = d T pt ( y ) dy = 4 (cid:80) pj =1 K jp (cid:0) yD jt + (1 − y ) D t − (cid:96) +1 jt (cid:1) (cid:0) D jt − D t − (cid:96) +1 jt (cid:1) nσ V ∆ ( p ) , s (1) pt ( y ) = d s pt ( y ) dy = 2 (cid:80) pj =1 K jp M jt ( y ) (cid:80) t − s = t − (cid:96) D js nσ V ∆ ( p ) , (cid:101) Σ (1 , pt ( y ) = d (cid:101) Σ (1) pt ( y ) dy = f (2) ( s pt ( y )) s (1) pt ( y ) S pt ( y ) + f (1) ( s pt ( y )) S (1) pt ( y ) , (cid:101) Σ (2 , pt ( y ) = d (cid:101) Σ (2) pt ( y ) dy = f (2) ( s pt ( y )) s (1) pt ( y ) T pt ( y ) + f (1) ( s pt ( y )) T (1) pt ( y )+ f (3) ( s pt ( y )) s (1) pt ( y ) ( S pt ( y )) + 2 f (2) ( s pt ( y )) S pt ( y ) S (1) pt ( y ) , I (2) tn ( y ) = I (cid:48)(cid:48) ( M t ( y )) , and I (1)1 t ( y ) = (cid:18) e − (cid:19) I (2) tn ( y ) M − et ( y ) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1) pt ( y ) (cid:33) p n (cid:88) p =1 Σ e − pt ( y ) Σ (1) pt ( y ) , I (1)2 t ( y ) = (cid:18) e − (cid:19) (cid:18) e − (cid:19) I (1) tn ( y ) M − et ( y ) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1) pt ( y ) (cid:33) p n (cid:88) p =1 Σ e − pt ( y ) Σ (1) pt ( y ) , I (1)3 t ( y ) = 2 (cid:18) e − (cid:19) ( e − I (1) tn ( y ) M − et ( y ) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1) pt ( y ) (cid:33) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:16) Σ (1) pt ( y ) (cid:17) (cid:33) , I (1)4 t ( y ) = 2 (cid:18) e − (cid:19) I (1) tn ( y ) M − et ( y ) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1) pt ( y ) (cid:33) (cid:32) p n (cid:88) p =1 Σ e − pt ( y ) (cid:101) Σ (1 , pt ( y ) (cid:33) . To bound the moments of (cid:101) Σ (1) pt ( y ), (cid:101) Σ (1 , pt ( y ) and Σ (1) pt ( y ), consider first (cid:107) S pt ( y ) (cid:107) a , (cid:13)(cid:13)(cid:13) S (1) pt ( y ) (cid:13)(cid:13)(cid:13) a and (cid:13)(cid:13)(cid:13) s (1) pt ( y ) (cid:13)(cid:13)(cid:13) a . For (cid:107) S pt ( y ) (cid:107) a and (cid:13)(cid:13)(cid:13) S (1) pt ( y ) (cid:13)(cid:13)(cid:13) a , (B.18), the Burkholder inequality, (B.6) p n = O (cid:0) n / (cid:1) , 2 p n ≤ (cid:96) ≤ p n and Θ a ( (cid:96) − p n ) ≤ Cp − n give (cid:107) S pt ( y ) (cid:107) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 K jp (cid:16)(cid:80) t − (cid:96) − s = j +1 D js + y (cid:80) t − s = t − (cid:96) D js + (cid:80) ns = t +1 η js (cid:17) D jt nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + 2 | − y | p (cid:88) j =1 | K jp | nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:32) t − (cid:96) − (cid:88) s = j +1 D js + y t − (cid:88) s = t − (cid:96) D js + n (cid:88) s = t +1 η js (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (cid:13)(cid:13) D jt − D t − (cid:96) +1 jt (cid:13)(cid:13) a ≤ C (cid:32) n / + p n n + (cid:18) p n n (cid:19) / Θ a ( (cid:96) − p n ) (cid:33) ≤ Cn / , (cid:13)(cid:13)(cid:13) S (1) pt ( y ) (cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 K jp (cid:0)(cid:80) t − s = t − (cid:96) D js (cid:1) D jt nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + 2 | − y | p (cid:88) j =1 | K jp | nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − (cid:88) s = t − (cid:96) D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (cid:13)(cid:13) D jt − D t − (cid:96) +1 jt (cid:13)(cid:13) a + 2 p (cid:88) j =1 | K jp | nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − (cid:96) − (cid:88) s = j +1 D js + y t − (cid:88) s = t − (cid:96) D js + n (cid:88) s = t +1 η js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a (cid:13)(cid:13) D jt − D t − (cid:96) +1 jt (cid:13)(cid:13) a ≤ C (cid:32) (cid:96) / n + (cid:96) / p / n n Θ a ( (cid:96) − p n ) + (cid:18) p n n (cid:19) / Θ a ( (cid:96) − p n ) (cid:33) ≤ C (cid:32) p / n n + 1( np n ) / (cid:33) , (cid:107) T pt ( y ) (cid:107) a ≤ C p / n n , (cid:13)(cid:13)(cid:13) T (1) pt ( y ) (cid:13)(cid:13)(cid:13) a ≤ Cnp n . For (cid:13)(cid:13)(cid:13) s (1) pt ( y ) (cid:13)(cid:13)(cid:13) a (B.18), p n = O (cid:0) n / (cid:1) and the Burkholder inequality give (cid:13)(cid:13)(cid:13) s (1) pt ( y ) (cid:13)(cid:13)(cid:13) a ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − (cid:88) s = t − (cid:96) p (cid:88) j =1 K jp nσ V ∆ ( p ) (cid:32) t − (cid:96) − (cid:88) s = j +1 D js (cid:33) D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 K jp (cid:0)(cid:80) t − s = t − (cid:96) D js (cid:1) nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) pj =1 K jp (cid:0)(cid:80) t − s = t − (cid:96) D js (cid:1) (cid:0)(cid:80) ns = t +1 η js (cid:1) nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C t − (cid:88) s = t − (cid:96) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:88) j =1 K jp nσ V ∆ ( p ) (cid:32) t − (cid:96) − (cid:88) s = j +1 D js (cid:33) D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a / + C p (cid:88) j =1 | K jp | nσ V ∆ ( p ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − (cid:88) s = t − (cid:96) D js (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a + C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:16)(cid:80) pj =1 K jp (cid:0)(cid:80) t − s = t − (cid:96) D js (cid:1) (cid:17) / ( np ) / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ≤ C (cid:18) (cid:96) / (cid:18) n / + p n n (cid:19) + p / n (cid:96)n + (cid:96) / n / (cid:19) ≤ C (cid:18) p n n (cid:19) / . These bounds and (B.14) give, uniformly in y , p and t , (cid:13)(cid:13)(cid:13)(cid:101) Σ (1) pt ( y ) (cid:13)(cid:13)(cid:13) a ≤ Cn / , (cid:13)(cid:13)(cid:13) Σ (1) pt ( y ) (cid:13)(cid:13)(cid:13) a ≤ C (cid:18) p n n (cid:19) / , (cid:13)(cid:13)(cid:13)(cid:101) Σ (1 , pt ( y ) (cid:13)(cid:13)(cid:13) a/ ≤ C (cid:32) p / n n + (cid:18) p n n (cid:19) / + p / n n / + 1 np / n (cid:33) ≤ C p / n n . Now, arguing as for the study of (B.29), e = O (cid:0) p /an (cid:1) give uniformly in p , t and y , E (cid:104)(cid:12)(cid:12)(cid:12) I (1)1 t ( y ) (cid:12)(cid:12)(cid:12)(cid:105) + E (cid:104)(cid:12)(cid:12)(cid:12) I (1)2 t ( y ) (cid:12)(cid:12)(cid:12)(cid:105) + E (cid:104)(cid:12)(cid:12)(cid:12) I (1)4 t ( y ) (cid:12)(cid:12)(cid:12)(cid:105) ≤ C p / /an n / , E (cid:104)(cid:12)(cid:12)(cid:12) I (1)3 t ( y ) (cid:12)(cid:12)(cid:12)(cid:105) ≤ C p /an n / . It then follows (cid:80) nt =1 (cid:82) (cid:12)(cid:12)(cid:12) E (cid:104) I (1) t ( y ) (cid:105)(cid:12)(cid:12)(cid:12) dy ≤ Cp /an /n / . Since (cid:80) nt =1 (cid:82) (cid:12)(cid:12)(cid:12) E (cid:104) I (1) t ( y ; η t ) (cid:105)(cid:12)(cid:12)(cid:12) dy satisfies a similar bound, we have for (B.34), n (cid:88) t =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:104) I (1) t ( y ) − I (1) t ( y ; η t ) (cid:105) dy (cid:12)(cid:12)(cid:12)(cid:12) ≤ C p /an n / . Consider now (B.33). Since D t − (cid:96) +1 jt and η t are independent of the J (1) tn (0), M − et (0) and Σ pt (0), we have using (B.12), E [ I t (0) − I t (0; η t )]= 4 n E (cid:20)(cid:18) e − (cid:19) J (1) tn (0) M − et (0) p (cid:88) p ,p =1 Σ e − p t (0) Σ e − p t (0) f (cid:0) Σ e − p t (0) (cid:1) f (cid:0) Σ e − p t (0) (cid:1) p (cid:88) j =1 p (cid:88) j =1 (cid:0) E (cid:2) D t − (cid:96) +1 j t D t − (cid:96) +1 j t (cid:3) − E [ η j t η j t ] (cid:1) K j p (cid:16)(cid:80) t − (cid:96) +1 s = j +1 D j s + (cid:80) ns = t − (cid:96) η j s (cid:17) n / σ V ∆ ( p ) K j p (cid:16)(cid:80) t − (cid:96) +1 s = j +1 D j s + (cid:80) ns = t − (cid:96) η j s (cid:17) n / σ V ∆ ( p ) = 0 . Hence (B.33) and (B.34) give (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t =1 E (cid:104) I (2)1 t (0; D t ) − I (2)1 t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C p /an n / . To study (cid:12)(cid:12)(cid:12) E (cid:104) I (2)2 t (0; D t ) − I (2)2 t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12) , observe that, uniformly with respect to p , t and y ,max (cid:18)(cid:13)(cid:13)(cid:13)(cid:101) Σ (2) pt ( y ) (cid:13)(cid:13)(cid:13) a/ , (cid:13)(cid:13)(cid:13)(cid:101) Σ (2) pt ( y ; η t ) (cid:13)(cid:13)(cid:13) a/ (cid:19) ≤ C p / n n , max (cid:16)(cid:13)(cid:13)(cid:13)(cid:101) Σ (2 , pt ( y ) (cid:13)(cid:13)(cid:13) a , (cid:13)(cid:13)(cid:13)(cid:101) Σ (2 , pt ( y ; η t ) (cid:13)(cid:13)(cid:13) a (cid:17) ≤ C (cid:18) p n n / + 1 np n (cid:19) . Arguing as for (cid:80) nt =1 E (cid:104) I (2)1 t (0; D t ) − I (2)1 t (0; η t ) (cid:105) gives (cid:12)(cid:12)(cid:12)(cid:80) nt =1 E (cid:104) I (2)2 t (0; D t ) − I (2)2 t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12) ≤ C (cid:16) p /an n / + p /an p n (cid:17) , and provided e = O (cid:0) p / (2 a ) n (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t =1 E (cid:104) I (2)3 t (0; D t ) − I (2)3 t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t =1 E (cid:104) I (2)4 t (0; D t ) − I (2)4 t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C p /an n / . It then follows (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t =1 E (cid:104) I (2) t (0; D t ) − I (2) t (0; η t ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:18) p /an n / + 1 p − /an (cid:19) . (B.35)Substituting (B.32), (B.35) in (B.29), (B.28) shows that the Lemma is proved. (cid:3) B.6.3. End of the proof of Proposition A.3. The rest of the proof is divided in 3 steps. Step 1: Martingale approximation . Let (cid:101) S p and ˇ S p be as in (A.1) and (B.16) respec-tively. Let a = 4 a/ 3. The Cauchy-Schwarz inequality gives (cid:12)(cid:12)(cid:12) ˇ S p − (cid:101) S p (cid:12)(cid:12)(cid:12) = p (cid:88) j =1 (cid:32) K jp n / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M jn − n (cid:88) t = j +1 u t u t − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) × n / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M jn + n (cid:88) t = j +1 u t u t − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:33) ≤ C p (cid:88) j =1 n (cid:32) M jn − n (cid:88) t = j +1 u t u t − j (cid:33) / p (cid:88) j =1 n (cid:32) M jn + n (cid:88) t = j +1 u t u t − j (cid:33) / . Hence (cid:13)(cid:13)(cid:13) ˇ S p − (cid:101) S p (cid:13)(cid:13)(cid:13) a / ≤ C E a p (cid:88) j =1 n (cid:32) M jn − n (cid:88) t = j +1 u t u t − j (cid:33) a E a p (cid:88) j =1 n (cid:32) M jn + n (cid:88) t = j +1 u t u t − j (cid:33) a . Observe now that (B.4) gives E a p (cid:88) j =1 n (cid:32) M jn − n (cid:88) t = j +1 u t u t − j (cid:33) a ≤ (cid:32) n p (cid:88) j =1 E a (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M jn − n (cid:88) t = j +1 u t u t − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a (cid:35)(cid:33) / ≤ C (cid:16) pn (cid:17) / . Since the Burkholder inequality and max j E [ | D jt | a ] < ∞ give max j ∈ [1 ,p n ] E / a [ | M jn | a ] ≤ Cn / , we also have E a p (cid:88) j =1 n (cid:32) M jn + n (cid:88) t = j +1 u t u t − j (cid:33) a ≤ n p (cid:88) j =1 (cid:32) E a (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M jn + n (cid:88) t = j +1 u t u t − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a (cid:35)(cid:33) / ≤ n p (cid:88) j =1 (cid:32) E a [ | M jn | a ] + E a (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) t = j +1 u t u t − j − M jn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a (cid:35)(cid:33) / ≤ (cid:18) p ( Cn / + C ) n (cid:19) / ≤ Cp / . It then follows that (cid:13)(cid:13)(cid:13) ˇ S p − (cid:101) S p (cid:13)(cid:13)(cid:13) a / ≤ Cp/n / and them max p ∈ [1 ,p n ] E (cid:20)(cid:12)(cid:12)(cid:12)(cid:16) ˇ S p − (cid:101) S p (cid:17) /p / (cid:12)(cid:12)(cid:12) a / (cid:21) ≤ C ( p n /n ) a / . Hence the Markov inequality gives P (cid:32) max p ∈ [1 ,p n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˇ S p − (cid:101) S p p / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ p n (cid:88) p =1 P (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˇ S p − (cid:101) S p p / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ p n t a / max p ∈ [1 ,p n ] E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˇ S p − (cid:101) S p p / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a ≤ Ct a/ (cid:32) p a n n (cid:33) a / , and p n = o (cid:0) n / (2(1+4 / a )) (cid:1) gives max p ∈ [1 ,p n ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˇ S p − (cid:101) S p p / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o P (1) . (B.36) Step 2: some Gaussian approximations . Let γ (cid:48) n = γ n (1 + (cid:15)/ / (1 + (cid:15) ). (3.1) gives γ n ≥ γ (cid:48) n ≥ (cid:101) γ n = (2 ln ln p n ) / (1 + (cid:15)/ ι ( x ) with max j =1 , , sup x (cid:12)(cid:12) ι (3) ( x ) (cid:12)(cid:12) < ∞ and I ( x ≥ ≤ ι ( x ) ≤ I ( x ≥ − (cid:15) ). Let I ( x ) = ι ( x − γ (cid:48) n ). Let ˇ s p be as in (B.16). Then Lemma B.5 with e = p / (2 a ) n , (B.14) and (B.16), and Assumption R give P (cid:18) max p ∈ [2 ,p n ] { ˇ s p } ≥ γ (cid:48) n (cid:19) ≤ P ( M ≥ γ (cid:48) n ) ≤ E [ I ( M )] ≤ E [ I ( M ( η ))] + o (1) ≤ P ( M ( η ) ≥ γ (cid:48) n − (cid:15) ) + o (1) . We now look for a more explicit expression for the RHS. Recall that M ( η ) = (cid:16)(cid:80) p n p =1 f e (ˇ s p (1; η )) (cid:17) /e .Consider Ω ( p ) = [ ω , . . . , ω p ] (cid:48) where the ω p ’s are i.i.d. standard normal variables, K ( p ) = Diag ((1 − j/n ) K jp , j = 1 , . . . , p ) , C η ( p ) = [Cov ( η j t , η j t ) , j , j = 1 , . . . , p ] , V η ( p ) = C / η ( p ) K ( p ) C / η ( p ) , and D η ( p ) = Diag ((1 − j/n ) K jp Var ( η jt ) , j = 1 , . . . , p ) the p × p diagonal matrix obtainedfrom the diagonal entries of V η ( p ). Then the ˇ s p (1; η ), p = 1 , . . . , p n , have the same jointdistribution than ˜ s p = Ω ( p ) (cid:48) V η ( p ) Ω ( p ) − σ E ∆ ( p ) σ V ∆ ( p ) , p = 1 , . . . , p n , so that M ( η ) and (cid:102) M = (cid:16)(cid:80) p n p =1 f e (˜ s p ) (cid:17) /e have the same distribution, and then P (cid:18) max p ∈ [2 ,p n ] { ˇ s p } ≥ γ (cid:48) n (cid:19) ≤ P (cid:16) (cid:102) M ≥ γ (cid:48) n − (cid:15) (cid:17) + o (1) . Define now¯ s p = Ω ( p ) (cid:48) D η ( p ) Ω ( p ) − σ E ∆ ( p ) σ V ∆ ( p ) = (cid:80) pj =1 (cid:0) − jn (cid:1) K jp Var ( η jt ) ω j − σ E ∆ ( p ) σ V ∆ ( p ) . Then for all p = 1 , . . . , p n , | ˜ s p − ¯ s p | = (cid:12)(cid:12)(cid:12)(cid:12) Ω ( p ) (cid:48) ( V η ( p ) − D η ( p )) Ω ( p ) σ V ∆ ( p ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:88) ≤ j (cid:54) = j ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Cov (cid:32)(cid:18) − j n (cid:19) / K / j p η j t , (cid:18) − j n (cid:19) / K / j p η j t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | ω j | | ω j |≤ C (cid:88) ≤ j (cid:54) = j ≤ p n | Cov ( η j t , η j t ) | | ω j | | ω j | = O P (1) , by Lemma B.3. Hence since f ( x ) ≤ ∨ x by (B.14) and using (B.15), (cid:102) M ≤ (cid:18) O (cid:18) ln np / (2 a ) n (cid:19)(cid:19) max p ∈ [2 ,p n ] { ∨ ˜ s p } ≤ (cid:18) O (cid:18) ln np / (2 a ) n (cid:19)(cid:19) ∨ max p ∈ [2 ,p n ] { ˜ s p }≤ (cid:18) O (cid:18) ln nn / a (cid:19)(cid:19) max p ∈ [2 ,p n ] { ¯ s p } + O P (1) . Define now V ∆ ( p ) = (cid:32) p (cid:88) j =1 K jp (cid:33) / , s p = (cid:80) pj =1 K jp (cid:0) ω j − (cid:1) V ∆ ( p ) , which is such that | ¯ s p − s p | ≤ | e p | + | e p | where e p = (cid:18) σ V ∆ ( p ) σ V ∆ ( p ) − (cid:19) σ s p , e p = (cid:80) pj =1 (cid:8)(cid:0) − jn (cid:1) Var ( η jt ) − σ (cid:9) K jp ω j − σ (cid:80) pj =1 jn K jp σ V ∆ ( p ) . Since K (cid:48) ( · ) is continuous on [0 , p →∞ | V ∆ ( p ) s p | p / (2 ln ln p ) / ≤ (cid:18) (cid:90) K ( t ) dt (cid:19) / , almost surely. Since, under Assumption K, V ∆ ( p ) /p / → (cid:0) (cid:82) K ( t ) dt (cid:1) / by convergence of Riemannsums, this gives sup p ∈ [2 ,p n ] | s p | ≤ (2 ln ln p n ) / (1 + o P (1)) . (B.37)Observe also that Lemma A.2-(ii), p n = o (cid:0) n / (cid:1) , and Assumption K give uniformly in p ∈ [1 , p n ] (cid:12)(cid:12)(cid:12)(cid:12) V ∆ ( p ) V ∆ ( p ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:32) p p (cid:88) j =1 j n K jp (cid:33) / = o (cid:18) n / (cid:19) . Hence max p ∈ [2 ,p n ] | e p | = o P (cid:32)(cid:18) p n n (cid:19) / (cid:33) = o P (1) . Now, for max p ∈ [2 ,p n ] | e p | , we have by Lemmas A.2-(ii) and B.3, p n = o (cid:0) n / (cid:1) , and AssumptionK,max p ∈ [2 ,p n ] | e p | ≤ C (cid:40) p n (cid:88) j =1 (cid:12)(cid:12) Var ( η jt ) − σ (cid:12)(cid:12) ω j + 1 n p n (cid:88) j =1 jω j + p / n n (cid:41) = O P (1) + O P (cid:18) p n n (cid:19) = O P (1) . Hence max p ∈ [2 ,p n ] | ¯ s p − s p | = O P (1) and substituting in the bounds for P (cid:0) max p ∈ [2 ,p n ] { ˇ s p } ≥ γ (cid:48) n (cid:1) and (cid:102) M above gives, by (3.1), γ (cid:48) n = γ n (1 + (cid:15)/ / (1 + (cid:15) ), γ (cid:48) n ≥ (2 ln ln p n ) / (1 + (cid:15)/ 3) and(B.37) P (cid:18) max p ∈ [2 ,p n ] { ˇ s p } ≥ γ (cid:48) n (cid:19) = P (cid:18)(cid:18) O (cid:18) ln nn / a (cid:19)(cid:19) max p ∈ [2 ,p n ] { s p } + O P (1) ≥ γ (cid:48) n − (cid:15) (cid:19) + o (1) ≤ P (cid:18) max p ∈ [2 ,p n ] { s p } ≥ (2 ln ln p n ) / (1 + (cid:15)/ (cid:19) + o (1)= o (1) . (B.38) Step 3: Conclusion . Propositions A.2 and A.1, Lemma A.2 and p n = O (cid:0) n / (cid:1) , theexpression of ˇ S p and ˇ s p in (B.16) and (B.36) givesmax p ∈ [2 ,p n ] (cid:16) (cid:98) S p − (cid:98) S (cid:17) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) = max p ∈ [2 ,p n ] (cid:16) (cid:98) S p − (cid:98) S (cid:17) − (cid:98) R E ∆ ( p ) (cid:98) R V ∆ ( p )= (1 + o P (1)) max p ∈ [2 ,p n ] (cid:16) (cid:101) S p − (cid:101) S (cid:17) − R E ∆ ( p ) R V ∆ ( p ) + O P (cid:16) p / n (cid:16) (cid:98) R − R (cid:17)(cid:17) = (1 + o P (1)) max p ∈ [2 ,p n ] { ˇ s p } + O P (1) . Hence (B.38) gives, since γ n − γ (cid:48) n → + ∞ , P max p ∈ [2 ,p n ] (cid:16) (cid:98) S p − (cid:98) S (cid:17) / (cid:98) R − E ∆ ( p ) V ∆ ( p ) ≥ γ n ≤ P (cid:18) max p ∈ [2 ,p n ] { ˇ s p } ≥ γ (cid:48) n (cid:19) + o (1) = o (1) . This ends the proof of the Proposition. (cid:3) B.7. Proof of Propositions A.4 and A.5. When studying the mean and variance of (cid:101) S p ,we make use of Theorem 2.3.2 in Brillinger (2001) which implies in particular that, for anyreal zero-mean random variables Z , . . . , Z ,Var ( Z Z , Z Z ) = Var( Z , Z ) Var( Z , Z ) + Var( Z , Z ) Var( Z , Z )+ Cum ( Z , Z , Z , Z ) . (B.39)Note that Assumption R and Theorem B.1 imply thatsup n,q ∈ [2 , ∞ (cid:88) t ,...,t q = −∞ | Γ n (0 , t , . . . , t q ) | < ∞ . (B.40)B.7.1. Proof of Proposition A.4 . (B.39) yields E (cid:104) (cid:101) R j (cid:105) = 1 n n − j (cid:88) t ,t =1 E [ u t u t + j u t u t + j ]= 1 n n − j (cid:88) t ,t =1 (cid:0) R j + R t − t + R t − t + j R t − t − j + Γ (0 , j, t − t , t − t + j ) (cid:1) , where n − j (cid:88) t ,t =1 R t − t = ( n − j ) R + 2 n − j − (cid:88) (cid:96) =1 ( n − j − (cid:96) ) R (cid:96) , n − j (cid:88) t ,t =1 R t − t + j R t − t − j = ( n − j ) R j + 2 n − j − (cid:88) (cid:96) =1 ( n − j − (cid:96) ) R (cid:96) + j R (cid:96) − j , n − j (cid:88) t ,t =1 Γ (0 , j, t − t , t − t + j ) = n − j − (cid:88) (cid:96) = − n + j +1 ( n − j − | (cid:96) | ) Γ (0 , j, (cid:96), (cid:96) + j ) . Set k j = K ( j/p ) to prove the first equality and k j = K ( j/p ) /τ j for the second. Note thatAssumptions K and R give, in both case, max j ∈ [1 ,n − k j ≤ C and k j ≥ C I ( j ≤ p/ E (cid:34) n − (cid:88) j =1 k j (cid:101) R j (cid:35) − R n − (cid:88) j =1 (cid:18) − jn (cid:19) k j = n n − (cid:88) j =1 (cid:32)(cid:18) − jn (cid:19) + 1 n (cid:18) − jn (cid:19)(cid:33) k j R j (B.41)+ 2 n − (cid:88) j =1 k j n − j − (cid:88) (cid:96) =1 (cid:18) − j + (cid:96)n (cid:19) (cid:0) R (cid:96) + R (cid:96) + j R (cid:96) − j (cid:1) + n − (cid:88) j =1 k j n − j − (cid:88) (cid:96) = − n + j +1 (cid:18) − j + | (cid:96) | n (cid:19) Γ (0 , j, (cid:96), (cid:96) + j ) . We start with the item R (cid:80) n − j =1 (cid:0) − jn (cid:1) k j , which is equal to R E ( p ) when k j = K ( j/p ),that is when proving the first equality. When k j = K ( j/p ) /τ j , (A.4) gives, under Assump-tions K and R, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R n − (cid:88) j =1 (cid:18) − jn (cid:19) k j − E ( p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C p (cid:88) j =1 (cid:12)(cid:12) τ j − R (cid:12)(cid:12) ≤ C ∞ (cid:88) j =1 j − so that R (cid:80) n − j =1 (1 − j/n ) k j ≥ E ( p ) − C (cid:48) .Let us now turn to the other items. The lower bound k j ≥ CI ( j ≤ p/ 2) gives that (B.41)is larger than Cn (cid:80) p/ j =1 R j . To bound the remaining terms in (B.41), we note that by Assumptions K, R and (B.40), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 k j n − j − (cid:88) (cid:96) =1 (cid:18) − j + (cid:96)n (cid:19) R (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C n − (cid:88) j =1 I ( j ≤ p ) × ∞ (cid:88) j =1 R j ≤ Cp ∞ (cid:88) j =1 R j = o ( n ) ∞ (cid:88) j =1 R j , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 k j n − j − (cid:88) (cid:96) =1 (cid:18) − j + (cid:96)n (cid:19) R (cid:96) + j R (cid:96) − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C + ∞ (cid:88) j =1 + ∞ (cid:88) (cid:96) =1 | R (cid:96) + j R (cid:96) − j | ≤ C (cid:32) ∞ (cid:88) j =0 | R j | (cid:33) ≤ C, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 k j n − j − (cid:88) (cid:96) = − n + j +1 (cid:18) − j + (cid:96)n (cid:19) Γ (0 , j, (cid:96), (cid:96) + j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ∞ (cid:88) t ,t ,t = −∞ | Γ(0 , t , t , t ) | ≤ C uniformly with respect to p ∈ [1 , p n ]. Substituting these bounds in the equality aboveestablishes the proposition. (cid:3) B.7.2. Proof of Proposition A.5 . Let f be the spectral density of the alternative. Using(B.40), we obtain sup λ ∈ [ − π,π ] | f ( λ ) | ≤ C and ∞ (cid:88) j =1 R j ≤ C (B.42)because sup λ ∈ [ − π,π ] | f ( λ ) | ≤ (cid:16) | R | + 2 (cid:80) ∞ j =1 | R j | (cid:17) / (2 π ) and (cid:80) ∞ j =1 R j ≤ (cid:16)(cid:80) ∞ j =1 | R j | (cid:17) . Werecall that (cid:101) R j = (cid:80) n − jt =1 u t u t + j /n and define R j = E (cid:104) (cid:101) R j (cid:105) = (1 − j/n ) R j . Set k j = K ( j/p ) toprove the first equality and k j = K ( j/p ) /τ j for the second. Note that Assumptions K andR give, in both case, k j ≤ C I ( j ≤ p ). To avoid notation burdens, redefine (cid:101) S p as (cid:80) n − j =1 k j (cid:101) R j .Define D j = (cid:101) R j − R j . We have E [ D j ] = 0 and (cid:101) S p = n (cid:80) n − j =1 k j R j + 2 n (cid:80) n − j =1 k j R j D j + n (cid:80) n − j =1 k j D j . The inequality ( a + b ) ≤ a + 2 b implies thatVar (cid:16) (cid:101) S p (cid:17) ≤ (cid:32) n n − (cid:88) j =1 k j R j (cid:101) R j (cid:33) + 2 Var (cid:32) n n − (cid:88) j =1 k j D j (cid:33) . (B.43)By identity (B.39),Var (cid:32) n n − (cid:88) j =1 k j R j (cid:101) R j (cid:33) = n − (cid:88) j ,j =1 k j k j R j R j n − j (cid:88) t =1 n − j (cid:88) t =1 Cov ( u t u t + j , u t u t + j ) ≤ V + K with V = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j ,j =1 k j k j R j R j n − j (cid:88) t =1 n − j (cid:88) t =1 ( R t − t R t − t + j − j + R t − t − j R t − t + j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,K = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j ,j =1 k j k j R j R j n − j (cid:88) t =1 n − j (cid:88) t =1 Γ ( t , t + j , t , t + j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The second term on the right of (B.43) is, up to a multiplicative constant, equal toVar (cid:32) n n − (cid:88) j =1 k j D j (cid:33) = n n − (cid:88) j ,j =1 k j k j Cov (cid:0) D j , D j (cid:1) . Applying (B.39) twice we obtainCov (cid:0) D j , D j (cid:1) = 1 n n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Cov (cid:34) (cid:89) q =1 (cid:0) u t q u t q + j − E [ u t q u t q + j ] (cid:1) , (cid:89) q =3 (cid:0) u t q u t q + j − E [ u t q u t q + j ] (cid:1)(cid:35) = 1 n n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 [Cov ( u t u t + j , u t u t + j ) Cov ( u t u t + j , u t u t + j )+ Cov ( u t u t + j , u t u t + j ) Cov ( u t u t + j , u t u t + j )]+ 1 n n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Cum ( u t u t + j , u t u t + j , u t u t + j , u t u t + j )= 2 n (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 ( R t − t R t − t + j − j + R t − t − j R t − t + j + Γ( t , t + j , t , t + j )) (cid:33) + 1 n n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Cum ( u t u t + j , u t u t + j , u t u t + j , u t u t + j ) . Since ( a + b + c ) ≤ a + b + c ), we can write Var (cid:16) n (cid:80) n − j =1 k j D j (cid:17) ≤ V + K + 6 K (cid:48) with V = 1 n n − (cid:88) j ,j =1 k j k j (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t R t − t + j − j (cid:33) + (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t − j R t − t + j (cid:33) ,K = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Cum ( u t u t + j , u t u t + j , u t u t + j , u t u t + j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,K (cid:48) = 1 n n − (cid:88) j ,j =1 k j k j (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 Γ ( t , t + j , t , t + j ) (cid:33) , Substituting in (B.43) shows that the proposition holds if the following inequalities hold: V ≤ Cn p (cid:88) j =1 R j , V ≤ Cp, K ≤ C, K (cid:48) ≤ C, K ≤ C p n . We establish these inequalities in five steps. Step 1: bound for V . We note that | R j | ≤ | R j | and that under Assumption K, 0 ≤ k j ≤ C for all j . Using a covariance spectral representation R j = (cid:82) π − π exp( ± ijλ ) f ( λ ) dλ , the Cauchy-Schwarz inequality and (B.42), we obtain by Assumption K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j ,j =1 k j k j R j R j n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t R t − t + j − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:90) π − π (cid:90) π − π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 k j R j n − j (cid:88) t =1 e itλ e i ( t + j ) λ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ( λ ) f ( λ ) dλ dλ ≤ (cid:32) sup λ ∈ [ − π,π ] | f ( λ ) | (cid:33) (cid:90) π − π (cid:90) π − π n − (cid:88) j ,j =1 k j R j k j R j n − j (cid:88) t =1 n − j (cid:88) t =1 e it λ e i ( t + j ) λ e − it λ e − i ( t + j ) λ dλ dλ ≤ C n − (cid:88) j =1 ( n − j ) k j R j ≤ Cn p (cid:88) j =1 R j , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j ,j =1 k j k j R j R j n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t − j R t − t + j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) π − π (cid:90) π − π n − (cid:88) j =1 k j R j n − j (cid:88) t =1 e − i ( t + j ) λ e − it λ × n − (cid:88) j =1 k j R j n − j (cid:88) t =1 e it λ e i ( t + j ) f ( λ ) f ( λ ) dλ dλ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) π − π (cid:90) π − π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 k j R j n − j (cid:88) t =1 e itλ e i ( t + j ) λ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ( λ ) f ( λ ) dλ dλ ≤ Cn p (cid:88) j =1 R j This establishes the bound for V . Step 2: bound for V . We define t = t + t (cid:48) , j = j + j (cid:48) . By Assumption K and by (B.40),1 n n − (cid:88) j ,j =1 k j k j (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t R t − t − j + j (cid:33) ≤ Cn n − (cid:88) j =1 K ( j /p ) ∞ (cid:88) j (cid:48) = −∞ (cid:32) n + ∞ (cid:88) t (cid:48) = −∞ | R t (cid:48) R t (cid:48) + j (cid:48) | (cid:33) ≤ Cp × (cid:32) ∞ (cid:88) j ,t ,t = −∞ | R t R t + j R t R t + j | (cid:33) ≤ Cp (cid:32) ∞ (cid:88) t = −∞ | R t | (cid:33) ≤ Cp, n n − (cid:88) j ,j =1 k j k j (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 R t − t − j R t − t + j (cid:33) ≤ Cn n − (cid:88) j =1 K ( j /p ) ∞ (cid:88) j (cid:48) = −∞ (cid:32) n + ∞ (cid:88) t (cid:48) = −∞ | R t (cid:48)− j R t (cid:48) + j + j (cid:48) | (cid:33) ≤ Cp ∞ (cid:88) j (cid:48) ,t ,t = −∞ (cid:12)(cid:12) R t − j R t + j + j (cid:48) R t − j R t + j + j (cid:48) (cid:12)(cid:12) ≤ Cp ∞ (cid:88) j,t ,t = −∞ | R t R t + j R t R t + j |≤ Cp (cid:32) ∞ (cid:88) t = −∞ | R t | (cid:33) ≤ Cp, therefore V ≤ Cp . Step 3: bound for K . Define t = t + t . Assumption K, and (B.40) yield K ≤ C p (cid:88) j ,j =1 ∞ (cid:88) t = −∞ | Γ(0 , j , t, t + j ) | ≤ ∞ (cid:88) t ,t ,t = −∞ | Γ(0 , t , t , t ) | . Step 4: bound for K (cid:48) . (B.40) gives K (cid:48) ≤ n n − (cid:88) j ,j =1 k j k j (cid:32) n − j (cid:88) t =1 n − j (cid:88) t =1 | Γ (0 , j , t − t , t − t + j ) | (cid:33) ≤ C + ∞ (cid:88) j ,j =1 (cid:32) ∞ (cid:88) t = −∞ | Γ(0 , j , t, t + j ) | (cid:33) = C + ∞ (cid:88) j ,j =1 ∞ (cid:88) t ,t = −∞ | Γ(0 , j , t , t + j )Γ(0 , j , t , t + j ) |≤ C (cid:32) ∞ (cid:88) t ,t ,t = −∞ | Γ(0 , t , t , t ) | (cid:33) ≤ C. Step 5: bound for K . Bounding K requires additional notation. First set t = t + j , t = t + j , t = t + j and t = t + j , and note that t , . . . , t depend upon t , . . . , t and j , j only. For a partition B = { B (cid:96) , (cid:96) = 1 , . . . , d B } of { , . . . , } , define d B = Card B ,Γ B ( t , . . . , t ) = (cid:81) d B (cid:96) =1 Cum (cid:0) u t q , q ∈ B (cid:96) (cid:1) , and recall that Cum( u t ) = Eu t = 0. Then thelargest d B yielding a non-vanishing Γ B is d B = 4. When d B = 4, B is a pairwise partitionof { , . . . , } so that Γ B is a product of covariances. Let B be the set of indecomposablepartitions of the two-way table 1 52 63 74 8 , see Brillinger (2001, p. 20) for a definition. Then according to Brillinger (2001, Theorem2.3.2), Cum ( u t u t + j , u t u t + j , u t u t + j , u t u t + j )= (cid:88) B ∈B Γ B ( t , . . . , t ) = (cid:88) B ∈B ,d B ≤ Γ B ( t , . . . , t ) + (cid:88) B ∈B ,d B =4 Γ B ( t , . . . , t ) . Some properties of partitions in B are as follows. Call { , } , { , } , { , } and { , } fundamental pairs and say that a B in a partition B breaks the pair { , } if { , } is not asubset of B . Then partitions B ∈ B are such that each B (cid:96) ∈ B must break a fundamentalpair. Note that fundamental pairs play a symmetric role. Since t q +4 − t q is j or j withvanishing k j or k j if j or j is larger than p , the indexes t q and t q +4 of a fundamentalpair also play a symmetric role in the computations below. We now discuss the contributionto K of partitions of { , . . . , } according to the possible values 1 , . . . , d B . Due tosymmetry, we only consider representative partitions for each case.Under Assumption K and (B.40), the case d B = 1 gives a contribution to K bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn n (cid:88) t ,...,t = − n | Γ (0 , t − t , . . . , t − t ) |≤ Cn ∞ (cid:88) t (cid:48) ,...,t (cid:48) = −∞ | Γ (0 , t (cid:48) , . . . , t (cid:48) ) | ≤ Cn . The case d B = 2 corresponds to { Card B , Card B } being { , } , { , } or { , } . Thesecases are very similar and we limit ourselves to { , } and B = { , } . The correspondingcontribution to K is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn n (cid:88) t ,...,t = − n | Γ (0 , t − t ) Γ ( t − t , . . . , t − t ) |≤ Cn n (cid:88) t (cid:48) ,...,t (cid:48) = − n | Γ (0 , t (cid:48) ) Γ ( t (cid:48) , . . . , t (cid:48) ) | ≤ Cn n (cid:88) t = − n | R t | n (cid:88) t (cid:48) ,...,t (cid:48) = − n | Γ (0 , t (cid:48) − t (cid:48) , . . . , t (cid:48) − t (cid:48) ) | C ∞ (cid:88) t = −∞ | R t | ∞ (cid:88) t ,...,t = −∞ | Γ (0 , t , . . . , t ) | ≤ C, by Assumption K and (B.40).The case d B = 3 corresponds to { Card B , Card B , Card B } being { , , } or { , , } .We start with Card B = 2, Card B = 2 and Card B = 4. The discussion concerns thenumber of fundamental pair broken by B . Note that the situation where B breaks only B does not break any fundamentalpairs corresponds to partitions that are not indecomposable, so that the only possible casesare those where B breaks 4 or 2 fundamental pairs. • B breaks 4 fundamental pairs. Consider B = { , , , } , B = { , } and B = { , } . The corresponding contribution to K is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ (0 , t − t , t − t , t − t ) R t − t R t − t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C p n sup j | R j | ∞ (cid:88) t ,t ,t = −∞ | Γ (0 , t , t , t ) | ≤ C p n by Assumption K and (B.40). • B breaks 2 fundamental pairs. Take B = { , , , } , B = { , } and B = { , } .The change of variables t = t (cid:48) + t , t = t (cid:48) + t and t = t (cid:48) + t shows thatcontribution to K is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ (0 , t − t , t − t , j ) R t − t − j R t − t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn n − (cid:88) j =1 K ( j /p ) ∞ (cid:88) t (cid:48) ,t (cid:48) ,j = −∞ | Γ (0 , t (cid:48) , t (cid:48) , j ) | + ∞ (cid:88) t (cid:48) = −∞ (cid:12)(cid:12) R t (cid:48) (cid:12)(cid:12) × sup j | R j | ≤ C pn . under Assumption K and (B.40).We now turn to the case Card B = Card B = 3 and Card B = 2. Observe that B or B must break 3 or 1 fundamental pair. The discussion now concerns the fundamental pairswhich are simultaneously broken by B and B . Note that B and B cannot break thesame 3 fundamental pairs because if it did, B would be given by the remaining fundamental pair in which case B cannot communicate with B or B , a fact that would contradict therequirement that the partition { B , B , B } is indecomposable. • B and B break 3 fundamental pairs, 2 of which are the same. Take B = { , , } , B = { , , } and B = { , } . Using change of variables t = t + t (cid:48) , t = t + t (cid:48) and t = t + t (cid:48) , we can see that under Assumption K and (B.40) the contributionto K of this case is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ (0 , t − t , t − t ) Γ (0 , t − t + j , t − t + j ) R t − t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn n − (cid:88) j ,j =1 K ( j /p ) K ( j /p ) sup t ,t | Γ(0 , t , t ) | ∞ (cid:88) t (cid:48) ,t (cid:48) = −∞ | Γ (0 , t (cid:48) , t (cid:48) ) | + ∞ (cid:88) t (cid:48) = −∞ (cid:12)(cid:12) R t (cid:48) (cid:12)(cid:12) ≤ C p n Note that the case where B and B break 3 fundamental pairs with less than onein common is impossible.The next case assumes that B breaks only 1 fundamental pair, which is also necessarilybroken by B since B must contain the remaining unbroken pair. • B breaks 3 fundamental pairs and B breaks only 1 pair. Take B = { , , } , B = { , , } and B = { , } and consider a change of variables t = t + t (cid:48) , t = t + t (cid:48) and t = t + j − t (cid:48) . Under Assumption K and (B.40), the contributionof this term to K is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ (0 , t − t , t − t ) Γ ( t − t + j , , j ) R t − t + j − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C sup j | R j | n n − (cid:88) j K ( j /p ) ∞ (cid:88) t (cid:48) ,t (cid:48) = −∞ | Γ(0 , t (cid:48) , t (cid:48) ) | ∞ (cid:88) t (cid:48) ,j = −∞ | Γ ( t (cid:48) , , j ) | ≤ C pn . • B and B break only 1 pair. Note that B and B cannot break the same pairbecause B must be the remaining pair and cannot communicate, so that the par-tition is not indecomposable. Hence all the partitions in this case are similar to B = { , , } , B = { , , } , B = { , } . The change of variable t = t + t (cid:48) , t = − j + t + j + t (cid:48) and t = t − t (cid:48) yields a contribution to K bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ B ( t , . . . , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n − (cid:88) j ,j =1 k j k j n − j (cid:88) t ,t =1 n − j (cid:88) t ,t =1 Γ (0 , t − t , j ) Γ ( t − t , , j ) R t − t + j − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ∞ (cid:88) j ,t (cid:48) = −∞ | Γ(0 , t (cid:48) , j ) | ∞ (cid:88) j ,t (cid:48) = −∞ | Γ( t , , j ) | ∞ (cid:88) t (cid:48) = −∞ (cid:12)(cid:12) R t (cid:48) (cid:12)(cid:12) ≤ C. (cid:3) Supplementary material additional referencesBrillinger, D.R. (2001). Time Series Analysis: Data Analysis and Theory . Holt,Rinehart & Winston, New-York. Chow, Y.S. and H. Teicher (1988). Probability Theory. Independence, Interchangeabil-ity, Martingales . Second Edition, Springer. Li, D. and R.J. Tomkins (1996). Laws of the Iterated Logarithm for Weighted Indepen-dent Random Variables. Statistics and Probability Letters , 247–254. Priestley, M.B. (1981).