[PDF] Quantile Spectral Analysis for Locally Stationary Time Series

Abstract

Classical spectral methods are subject to two fundamental limitations: they only can account for covariance-related serial dependencies, and they require second-order stationarity. Much attention has been devoted lately to quantile-based spectral methods that go beyond covariance-based serial dependence features. At the same time, covariance-based methods relaxing stationarity into much weaker {\it local stationarity} conditions have been developed for a variety of time-series models. Here, we are combining those two approaches by proposing quantile-based spectral methods for locally stationary processes. We therefore introduce a time-varying version of the copula spectra that have been recently proposed in the literature, along with a suitable local lag-window estimator. We propose a new definition of local {\it strict} stationarity that allows us to handle completely general non-linear processes without any moment assumptions, thus accommodating our quantile-based concepts and methods. We establish a central limit theorem for the new estimators, and illustrate the power of the proposed methodology by means of a simulation study. Moreover, in two empirical studies (namely of the Standard \& Poor's 500 series and a temperature dataset recorded in Hohenpeissenberg) we demonstrate that the new approach detects important variations in serial dependence structures both across time and across quantiles. Such variations remain completely undetected, and are actually undetectable, via classical covariance-based spectral methods.

Full PDF

QQuantile Spectral Analysisfor Locally Stationary Time Series

Stefan

Birr a ∗ , Stanislav Volgushev b ∗ , Tobias Kley c ∗ ,Holger Dette a ∗ , and Marc Hallin d † a Ruhr-Universit¨at Bochum b Cornell University Ithaca c London School of Economics and Political Science d ECARES, Universit´e Libre de Bruxelles

Abstract

Classical spectral methods are subject to two fundamental limitations: they onlycan account for covariance-related serial dependencies, and they require second-orderstationarity. Much attention has been devoted lately to quantile-based spectral meth-ods that go beyond covariance-based serial dependence features. At the same time,covariance-based methods relaxing stationarity into much weaker local stationarity con-ditions have been developed for a variety of time-series models. Here, we are combiningthose two approaches by proposing quantile-based spectral methods for locally station-ary processes. We therefore introduce a time-varying version of the copula spectra thathave been recently proposed in the literature, along with a suitable local lag-windowestimator. We propose a new deﬁnition of local strict stationarity that allows us tohandle completely general non-linear processes without any moment assumptions, thusaccommodating our quantile-based concepts and methods. We establish a central limittheorem for the new estimators, and illustrate the power of the proposed methodol-ogy by means of a simulation study. Moreover, in two empirical studies (namely ofthe Standard & Poor’s 500 series and a temperature dataset recorded in Hohenpeis-senberg) we demonstrate that the new approach detects important variations in serialdependence structures both across time and across quantiles. Such variations remaincompletely undetected, and are actually undetectable, via classical covariance-basedspectral methods.

AMS 1980 subject classiﬁcation : 62M15, 62G35.Key words and phrases : Copulas, Nonstationarity, Ranks, Periodogram, Laplace spectrum. ∗ Supported by the Sonderforschungsbereich “Statistical modelling of nonlinear dynamic processes”(SFB 823, Teilprojekt A1, C1) of the Deutsche Forschungsgemeinschaft. † Acad´emie Royale de Belgique, CentER (Tilburg University), and ECORE. Supported by the IAP re-search network grant P7/06 of the Belgian government (Belgian Science Policy) and a Cr´edit aux Chercheursof the Fonds de la Recherche Scientiﬁque-FNRS. a r X i v : . [ m a t h . S T ] J u l Introduction

For more than a century, spectral methods have been among the favorite tools of time-series analysis. The concept of periodogram was proposed and discussed as early as 1898 bySchuster, who coined the term in a study (Schuster (1898)) of meteorological series. Themodern mathematical foundations of the approach were laid between 1930 and 1950 bysuch big names as Wiener, Cram´er, Kolmogorov, Bartlett, and Tukey. The main reason forthe unwavering success of spectral methods is that they are entirely model-free , hence fullynonparametric; as such, they can be considered a precursor to the subsequent developmentof nonparametric techniques in the area and, despite their age, they still are part of theleading group of methods in the ﬁeld.The classical spectral approach to time series analysis, however, remains deeply markedby two major restrictions: (i) as a second-order theory, it is essentially limited to modeling ﬁrst- and second-orderdynamics: being entirely covariance-based, it cannot accommodate heavy tails and in-ﬁnite variances, and cannot account for any dynamics in conditional skewness, kurtosis,or tail behavior; (ii) the assumption of second-order stationarity is pervasive: except for processes that,possibly after some adequate transformation such as diﬀerencing or cointegration, aresecond-order stationary, observations exhibiting time-varying distributional featuresare ruled out.The ﬁrst of these two limitations recently has attracted much attention, and new quantile-related spectral analysis tools have been proposed, which do not require second-order mo-ments, and are able to capture serial features that cannot be accounted for by the classicalsecond-order approach. Pioneering contributions in that direction are Hong (1999) and Li(2008), who coined the names of

Laplace spectrum and

Laplace periodogram . The Laplacespectrum concept was further studied by Hagemann (2013), and extended into cross-spectrum kernel concepts by Dette et al. (2015), who also introduced copula-based ver-sions of the same. Those copula spectral quantities are indexed by couples ( τ , τ ) of quantilelevels, and their collections (for ( τ , τ ) ∈ [0 , ) account for any features of the joint dis-tributions of pairs ( X t , X t − k ) in a strictly stationary process { X t } , without requiring anydistributional assumptions such as the existence of ﬁnite moments.That thread of literature also includes Li (2012, 2014), Kley et al. (2016), and Leeand Subba Rao (2012). Somewhat diﬀerent approaches were taken by Hong (2000), Daviset al. (2013), and several others; in the time domain, Linton and Whang (2007), Davisand Mikosch (2009), and Han et al. (2014) introduced the related concepts of quantilograms and extremograms . Strict or second-order stationarity, however, are essential in all thosecontributions.The pictures in Figure 1 show that the copula-based spectral methods developed in Detteet al. (2015) (where we refer to for details) indeed successfully account for serial featuresthat remain out of reach in the traditional approach. The series considered in Figure 1 is theclassical S&P500 index series, with T = 12092 observations from 1962 through 2013; moreprecisely, that series contains the diﬀerences of logarithms of daily opening and closing pricesfor about 51 years. That series is generally accepted to be white noise, yielding perfectlyﬂat periodograms. When rank-based copula periodograms are substituted for the classicalones, however, the picture looks quite diﬀerent. Three rank-based copula periodogramsare shown in Figure 1, for the quantile levels 0 .

1, 0 . .

9, respectively. The centralone, corresponding to the central part of the marginal distribution, is compatible with theassumption of white noise. But the more extreme ones (associated with the quantile levels 0 . .

9) yield a peak at the origin, pointing at a strong dependence in the tails which isdeﬁnitely not present in the median part of the (marginal) distribution.Now, all periodograms in Figure 1 were computed from the complete series (51 years,1 ≤ t ≤ k , of the couples ( X t , X t − k )). Is that assumption likely to hold true?The wavelet-based test proposed by Nason (2013) reveals signiﬁcant changes in the behavior3 .0 0.1 0.2 0.3 0.4 0.5 . . . . . . . . . . . . . . . . . . Figure 1:

S&P500, 1962-2013: the smoothed rank-based copula periodograms for τ = τ = 0 . . .

9, respectively. All curves are plotted against ω/ π . of that time series, with most signiﬁcant changes taking place around 1975, 1997 and duringthe period 2007-2013. Moreover, two rank-based copula periodograms for τ = τ = 0 . (a) models with time-dependent parameters : inherently parametric, those models are mim-icking the traditional ones, but with parameters varying over time—see Subba Rao(1970) for a prototypical contribution, Azrak and M´elard (2006) for an in-depth studyof the time-varying ARMA case; (b) the evolutionary spectral methods , initiated by Priestley (1965), where the process understudy admits a spectral representation with time-varying transfer function —a second-order characterization, thus; (c) piecewise stationary processes , in relation with change-point analysis: see, e.g., Daviset al. (2005); (d) the locally stationary process approach initiated by Dahlhaus (1997), based on the as-sumption that, over a short period of time (that is, locally in time), the process under4 .0 0.1 0.2 0.3 0.4 0.5 . . . . . before 2007 0.0 0.1 0.2 0.3 0.4 0.5 . . . . . after 2007 τ = 0.1 Figure 2:

S&P500: the smoothed rank-based copula periodograms for τ = τ = 0 . ω/ π. study behaves approximately as a stationary one; related concepts have been devel-oped recently by Dahlhaus and Subba Rao (2006), Zhou and Wu (2009a, b), Roueﬀ andVon Sachs (2011) and Vogt (2012); wavelet-based versions also have been considered,as in Nason et al. (2000). We refer to Dahlhaus (2012) for a survey of this approach.Those four approaches, as already mentioned, are not without overlaps: the originalconcept by Dahlhaus (1997) is based on time-varying (second-order) spectral representations,turned into time-domain linear MA( ∞ ) ones by Dahlhaus and Polonik (2006); Dahlhaus andSubba Rao (2006) and Fryzlewicz et al. (2008) deal with locally stationary ARCH models;although much more general, Zhou and Wu (2009a, b) also assume a form of time-varyingnonlinear MA( ∞ ) representation, and hence also resort to (a). Most references requiremoment assumptions, either by nature (being based on a spectral representation), or by thenature of the stationary approximation they are considering.In this paper, we address the two limitations (i) and (ii) of traditional spectral analysissimultaneously by developing a locally stationary version of the quantile-related spectralanalysis proposed in Dette et al. (2015). At the same time we provide a thorough theoreticalunderpinning for the proposed approach. While adopting the locally stationary ideas of (d),5owever, we turn them into a fully non-parametric and moment-free approach, adapted tothe nature of quantile- and copula-based spectral concepts (see Harvey (2010) for a related,time-domain, attempt). The deﬁnitions of local stationarity existing in the literature indeedare not general enough to accommodate quantile spectra, and we therefore formulate a newconcept of local strict stationarity . Contrary to Dahlhaus and Polonik (2006) and Zhouand Wu (2009a, b), who deal with time-varying (linear or nonlinear) moving averages, toDahlhaus (1997), which is based on time-varying second-order spectra, to Vogt (2012), wherethe approximation is in terms of random variables and requires ﬁnite moments of order ρ > time-varying copula spectrum and its estimators are introduced in Section 2 andSection 3, respectively. In Section 4, we illustrate the application of the new methodology bymeans of a small simulation study and two real-life examples, while the theoretical propertiesof time-varying copula spectra and a corresponding lag-window estimator are investigatedin Section 5. In particular, a central limit theorem for our local lag-window estimator isestablished. The proofs and additional information concerning the simulation studies andthe datasets analyzed in Section 4.3 and 4.4 are deferred to an online supplement. Consider a series ( X , . . . , X T ) of length T as being part of a triangular array ( X t,T , ≤ t ≤ T ), T ∈ N , of ﬁnite-length realizations of nonstationary processes { X t,T , t ∈ Z } , T ∈ N .The intuitive idea behind all deﬁnitions of local stationarity consists in the assumption thatthose processes have an approximately stationary behavior over a short period of time. More6ormally, one usually assume the existence of a collection, indexed by ϑ ∈ (0 , { X ϑt , t ∈ Z } such that the nonstationary process { X t,T , t ∈ Z } can be approxi-mated (in a suitable way), in the vicinity of time t , by the stationary process { X ϑt , t ∈ Z } associated with ϑ = t/T .The exact nature of this approximation has to be adapted to the speciﬁc problem understudy. If the objective is a locally stationary extension of classical spectral analysis, only theautocovariances Cov( X t,T , X s,T ) have to be approximated. In the quantile-related contextconsidered here, the joint distributions of X t,T and X s,T are the feature of interest, andtraditional autocovariances are to be replaced with autocovariances of indicators, of theform Cov( I { X t,T ≤ q t,T ( τ ) } , I { X s,T ≤ q s,T ( τ ) } ), where q t,T ( τ ) and q s,T ( τ ) stand for the τ -quantileof X t,T and the τ -quantile of X s,T , respectively, with τ , τ ∈ (0 , X t,T and X s,T .In a strictly stationary context, this leads to the so-called Laplace spectrum , ﬁrst consid-ered by Li (2008) for a strictly stationary process { Y t , t ∈ Z } with marginal median zero.That spectrum is deﬁned as C . , . ( ω ) := 12 π (cid:88) k ∈ Z e − iωk Cov( I { Y ≤ } , I { Y − k ≤ } ) , ω ∈ ( − π, π ] . Li’s concept was extended by Hagemann (2013), Li (2012) and Dette et al. (2015) to generalquantile levels. The most general version, which also takes into account cross -covariancesof indicators, was introduced by Dette et al. (2015). Denoting by q the marginal quantilefunction of Y t , they deﬁne the copula spectral density kernel as C τ ,τ ( ω ) := 12 π (cid:88) k ∈ Z e − iωk Cov( I { Y ≤ q ( τ ) } , I { Y − k ≤ q ( τ ) } ) , τ , τ ∈ (0 , , ω ∈ ( − π, π ] . Those deﬁnitions all heavily rely on the strict stationarity of the underlying time series;without this assumption, actually, they do not make much sense. It seems natural, thus,7o look for some adequate notion of local stationarity that can be employed to characterizethe notion of a local copula-based spectrum. However, the deﬁnitions of local stationaritypreviously considered in the literature are placing unnecessarily strong restrictions on theclasses of processes that can be considered. In particular, Dahlhaus and Polonik (2006),Dahlhaus and Subba Rao (2006) and Vogt (2012) rely on moment assumptions that areneither desirable nor natural in a quantile context, and are not required for the deﬁnitionof copula spectra. We therefore introduce a new concept of local strict stationarity whichcompletely avoids moment assumptions. That concept is not totally unrelated to the existingones, though, and we also show that, under adequate conditions, processes that ﬁt into theframework of Dahlhaus and Subba Rao (2006) or Dahlhaus and Polonik (2006) are locallystrictly stationary in the new sense; see Section 5.1 for details. Similar results certainly alsocould be obtained for the Zhou and Wu (2009a, b) concept, but they are less obvious and,in order to not overload the paper, we do not pursue into that direction.The copula spectral density kernels of a stationary process { Y t } are deﬁned in terms of itsbivariate marginal distribution functions. Therefore, it is natural to use bivariate marginaldistribution functions when evaluating, in the deﬁnition of local stationarity, the distancebetween the non-stationary process { X t,T } and its stationary approximation { X ϑt } . Deﬁnition 2.1.

A triangular array { ( X t,T ) t ∈ Z } T ∈ N of processes is called locally strictlystationary (of order two) if there exists a constant L > and, for every ϑ ∈ (0 , , a strictlystationary process { X ϑt , t ∈ Z } such that, for every ≤ r, s ≤ T, (cid:13)(cid:13) F r,s ; T ( · , · ) − G ϑr − s ( · , · ) (cid:13)(cid:13) ∞ ≤ L (cid:16) max( | r/T − ϑ | , | s/T − ϑ | ) + 1 /T (cid:17) , (2.1) where (cid:107) · (cid:107) ∞ stands for the supremum norm, while F r,s ; T ( · , · ) and G ϑk ( · , · ) denote the jointdistribution functions of ( X r,T , X s,T ) and ( X ϑ , X ϑ − k ) , respectively. Here, ‘of order two’ refers to the fact that (2.1) is based on bivariate distributions only.Letting y tend to inﬁnity in F r,s ; T ( x, y ) and G ϑk ( x, y ), we get an analogous condition for the8arginal distributions F t ; T and G ϑ of X t,T and X ϑ , namely (cid:13)(cid:13) F t ; T ( · ) − G ϑ ( · ) (cid:13)(cid:13) ∞ ≤ L (cid:12)(cid:12) t/T − ϑ (cid:12)(cid:12) + L/T. (2.2)Intuitively, (2.1) and (2.2) imply that the univariate and bivariate distribution func-tions F t ; T and F r,s ; T of the process { X t,T } are allowed to change smoothly over time. Acrucial advantage of this deﬁnition is its nonparametric nature, as it does not depend on anyspeciﬁc data-generating mechanism.Whenever the data-generating process can be described in terms of a parametric model,strict stationarity in the sense of Deﬁnition 2.1 holds if the underlying parameters changesmoothly over time. Familiar examples include MA( ∞ ), ARCH( ∞ ) and GARCH( p, q ) mod-els with time-varying coeﬃcients. Suﬃcient conditions for local strict stationarity of thosemodels are discussed in Section 5.1, where we also provide explicit forms of the strictlystationary approximating processes. Turning to the deﬁnition of a localized version of copula spectral density kernels, ﬁrstconsider the copula cross-covariance kernels associated with the strictly stationary pro-cess { X ϑt , t ∈ Z } , ϑ ∈ (0 , lag- h -copula cross-covariance kernel of { X ϑt } , as deﬁnedin Dette et al. (2015), is γ ϑh ( τ , τ ) := Cov( I { X ϑt ≤ q ϑ ( τ ) } , I { X ϑt − h ≤ q ϑ ( τ ) } ) , τ , τ ∈ (0 , , where q ϑ ( τ ) denotes X ϑt ’s marginal quantile of order τ .The cross-covariances involved in the above deﬁnition always exist, and their collection(for τ , τ ∈ (0 ,

1) and given lag h ) provides a canonical characterization of the joint copulaof ( X ϑt , X ϑt − h ), hence, an approximate (in the sense of (2.1)) description of the joint copulaof all couples of the form ( X t,T , X t − h,T ). Therefore we also call γ ϑh ( τ , τ ) the time-varying ag- h -copula cross-covariance kernel of { X t,T } . If we assume that, for all τ , τ ∈ (0 , h -covariance kernels γ ϑh ( τ , τ ) are absolutely summable, we moreover can deﬁne the local or time-varying copula spectral density kernel of { X t,T } as f ϑ ( ω, τ , τ ) := 12 π ∞ (cid:88) h = −∞ γ ϑh ( τ , τ ) e − ihω , τ , τ ∈ (0 , , ω ∈ ( − π, π ] . (2.3)The time-varying cross-covariance kernel then admits the representation γ ϑh ( τ , τ ) = (cid:90) π − π e ihω f ϑ ( ω, τ , τ ) dω, ω ∈ ( − π, π ] , τ , τ ∈ (0 , . Comparing those representations with the local spectral densities of Dahlhaus (1997), wesee that the autocovariances of the approximating processes there are replaced by copulas.This indicates that the local spectral density kernels (2.3) can be viewed as a completelynon-parametric generalization of classical L -based tools. In particular, those kernels cancapture pairwise serial dependencies of arbitrary forms. For more detailed comparisons, werefer to Dette et al. (2015) and Kley et al. (2016). The usefulness of the concepts discussedhere for data analysis is provided, via simulation and the analysis of two real datasets, theclassial S&P 500 and a meteorological one, in Section 4. Given observations X ,T , . . . , X T,T , the classical approach to the estimation of the time-varying spectral density of a locally stationary time series consists in considering a subsetof n data points centered around a time point t . To formalize ideas, let m T be a sequence of positive integers diverging to inﬁnity as T → ∞ ,and deﬁne the discrete neighborhood N t ,T := { ≤ t ≤ T : | t − t | < m T } , with cardinal-ity n = n ( m T , T ). Denoting by ω j,n = 2 πj/n, ≤ j ≤ (cid:98) n +12 (cid:99) the positive Fourier frequencies,let ϕ n : ω (cid:55)→ ϕ n ( ω ) := ω j,n be the piecewise constant function mapping ω ∈ (0 , π ) to the10losest Fourier frequency, i.e. to the frequency ω j,n such that ω ∈ ( ω j,n − πn , ω j,n + πn ].Deﬁning T ( k ) := { t ∈ N t ,T : t + k ∈ N t ,T } , ˜ F t ; T ( x ) := 12 T / (cid:88) | t − t |≤ T / I { X t,T ≤ x } , and ˆ q t ,T ( τ ) := ˜ F − t ; T ( τ ), consider the local lag-window estimator (at the Fourier frequen-cies ω j,n = 2 πj/n )ˆ f t ,T ( ω j,n , τ , τ ) := 12 π (cid:88) | k |≤ n − K ( k/B n ) e − iω j,n k × n (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ ˆ q t ,T ( τ ) } − τ (cid:17)(cid:16) I { X t + k ; T ≤ ˆ q t ,T ( τ ) } − τ (cid:17) , (3.1)where B n → ∞ as n → ∞ and K : R → R is continuous in x = 0 and satisﬁes K (0) = 1and lim | x |→∞ K ( x ) = 0. In order to extend this estimator ˆ f t ,T ( · , τ , τ ) to the interval (0 , π ) , let ˆ f t ,T ( ω, τ , τ ) := ˆ f t ,T ( ϕ n ( ω ) , τ , τ ) . In Section 5.2 below, we prove that, under mild con-ditions on the bandwidth parameters and the underlying time series, the local lag-windowestimator is consistent for the copula spectral density f ϑ ( ω, τ , τ ) and asymptotically nor-mally distributed. This is a novel result even in the stationary case, as Kley et al. (2016)consider an estimator based on smoothed periodograms instead.Before we address the asymptotic theory for the new estimators, we illustrate their prop-erties and advantages by means of a brief simulation study and a detailed analysis of tworeal-life datasets. One important practical aspect of the estimation of a quantile spectral density is the choiceof a local window length n and a smoothing parameter B n . In Section 5.2 and Theorem 5.1,we derive the asymptotic distribution of the estimator, which allows to derive an expres-11 f t ,T ( ω, α, α ) (cid:61) ˆ f t ,T ( ω, β, α ) (cid:61) ˆ f t ,T ( ω, γ, α ) (cid:60) ˆ f t ,T ( ω, β, α ) ˆ f t ,T ( ω, β, β ) (cid:61) ˆ f t ,T ( ω, γ, β ) (cid:60) ˆ f t ,T ( ω, γ, α ) (cid:60) ˆ f t ,T ( ω, γ, β ) ˆ f t ,T ( ω, γ, γ )Table 1: Patterns for the 3 × α = 0 . β = 0 . γ = 0 . , with t ∈ T ⊂ { , . . . , T } and ω ∈ (0 , π ). For example, thetop-right corner, in all those ﬁgures, displays a time-frequency plot of the imaginary parts of thecollection (cid:0) ˆ f t ,T ( ω, . , . (cid:1) t ∈T ,ω ∈ Ω . sion for the smoothing parameters that minimizes the asymptotic mean squared error (seeRemark 5.3 for additional details). Those expressions, of course, cannot be readily used inpractice, since they depend on the actual time-varying copula spectral densities and theirderivatives, which are unknown. Estimating such derivatives is even more diﬃcult than esti-mating the original spectral density, and a plug-in approach to bandwidth selection thereforeseems diﬃcult to implement. For the estimation of local L -spectra, an interesting alterna-tive has been proposed by Cranstoun et al. (2002). Unfortunately, that approach relies onwavelets instead of local windows for localization in time; whether it can be implementedhere is not clear. For the implementation of our methodology, we propose to study diﬀerentlocal window lengths and bandwidth parameters and in the simulation study we illustratethe performance of the estimators for diﬀerent window lengths. Plots of time-varying spectral densities and their estimators are provided in the form of time-frequency heatmaps . The vertical axis in all those plots represents frequencies ( ω/ π ,ranging from 0 to 0.5), the horizontal axis the span of time 1 , . . . , T over which the time-varying spectral quantities are estimated. All 3 × τ = τ = τ ), or their real and imaginary parts(for τ (cid:54) = τ ) are represented via a continuous ( τ , τ )-dependent color code, ranging from12yan and light blue (for small values) to dark blue, yellow, orange, and red (for large values).As explained below, this color code also has an interpretation in terms of signiﬁcance ofcertain p -values. This latter interpretation requires a preliminary calibration step, though.Indeed, being ‘small’, for a ( τ = τ = τ ) − periodogram value (which by nature is nonnegativereal) does not have the same meaning as being ‘small’ for the imaginary or the real part ofsome ( τ (cid:48) , τ (cid:48) ) − cross-periodogram (for which negative values are possible): in order to makeinter-frequency comparisons possible, a meaningful color code therefore has to be ( τ , τ )-speciﬁc. For this purpose we introduce a distribution-free simulation-based calibration thatfully exploits the properties of copula-based quantities.To explain the idea behind this calibration step, consider plotting, for some subset T × Ω(with T ⊂ { , ..., T } and Ω ⊂ (0 , π )), a collection (cid:0) (cid:60) ˆ f t ,T ( ω, τ , τ ) (cid:1) t ∈T ,ω ∈ Ω of the real parts(the imaginary parts are dealt with exactly the same way) of estimators computed fromthe realization X , ...., X T of some time series of interest. Assume that a bandwidth B n and a window length n are used for the estimation. A color is then assigned to each valueof (cid:60) ˆ f t ,T ( ω, τ , τ ) along the following steps: (i) simulate M = 10 independent realizations ( U ,m , . . . , U n,m ), m = 1 , ..., M of an i.i.d.sequence of random variables of length n (uniform over [0 , (ii) for each of those M realizations, compute the estimator ˆ f U,mt ,T ( ω, τ , τ ) of the local spec-tral density based on the same bandwidth B n ; note that the number n of observationsin each replication equals the window length used for our original collection; (iii) deﬁne, for each m = 1 , ..., M = 10 , the quantities Q m max ( τ , τ ) := max ω (cid:60) ˆ f U,mt ,T ( ω, τ , τ )and Q m min ( τ , τ ) := min ω (cid:60) ˆ f t ,T ( ω, τ , τ ); obtain the empirical 99 .

5% quantile q max ( τ , τ )of ( Q m max ( τ , τ )) m =1 ,...,M , and the empirical 0 .

5% quantile q min ( τ , τ )of ( Q m min ( τ , τ )) m =1 ,...,M , respectively.The color palette then is set as follows: all points ( t , ω ) ∈ T × Ω with (cid:60) ˆ f t ,T ( ω, τ , τ )value in [ q min ( τ , τ ) , q max ( τ , τ )] receive dark blue color. Next, letting13 min ( τ , τ ) := min(min t ,ω (cid:60) ˆ f t ,T ( ω, τ , τ ) , q min ( τ , τ ) − ( q max ( τ , τ ) − q min ( τ , τ ))) ,v max ( τ , τ ) := max(max t ,ω (cid:60) ˆ f t ,T ( ω, τ , τ ) , q max ( τ , τ ) + ( q max ( τ , τ ) − q min ( τ , τ ))) , all points ( t , ω ) for which (cid:60) ˆ f t ,T ( ω, τ , τ ) lies in the interval [ v min ( τ , τ ) , q min ( τ , τ )] receivea color ranging, according to a linear scale, from cyan to light and dark blue, while the colorsfor the interval [ q max ( τ , τ ) , v max ( τ , τ )] similarly range from dark blue to yellow and red.All our time-frequency heat diagrams thus have the following interpretation. For eachgiven choice of ( τ , τ ) and a timepoint t , the probability, under the hypothesis of (strong)white noise, that the real (resp., the imaginary) part at time t = t of the smoothed ( τ , τ )-time-varying periodogram lies entirely in the dark blue area is approximately 0 . . Hence,the presence of light blue, cyan or orange-red zones in a diagram indicates a signiﬁcant (atprobability level 1%) deviation from white noise behavior. The location of those zones more-over tells us where in the spectrum, and when in the period of observation, those signiﬁcantdeviations take place, along with an evaluation of their magnitude. The correspondence be-tween the actual size of the estimate and the colors used is provided by the color scale on theright-hand side of each diagram. Note that here and in the sequel, we use the terminology’white noise’ to denote i.i.d. (and not just uncorrelated) variables.This calibration method yields a universal distribution-free and model-free color scalingwhich also provides (as far as dark blue regions are concerned) a hypothesis testing interpre-tation of the results. The same color code is used for the empirical analyses in Sections 4.3and 4.4, as well as for the simulations in Section 4.2. Currently, an R-package containingthe codes used here is in preparation (a preliminary version is available upon request).

This section provides a numerical illustration of the performance of the new estimators of thetime-varying copula spectral densities in two time-varying models that have been consideredelsewhere in the literature. For both models, six time-frequency heat plots, labeled (a)-(f),of time-varying copula spectral densities are provided, for each combination of the quantile14evels 0 .

1, 0 .

5, and 0 .

9, using the color code described in Section 4.1: (a) the actual time-varying copula spectral densities and (b)-(f ) the local lag-window estimators of the copula spectral densities for diﬀerent windowlength n. Currently, we do not have simple closed-form expressions for the actual spectra, andwe doubt such expressions are possible (but for the theoretical deﬁnition (2.3)). This isin contrast with classical L spectral analysis where, at least for linear processes, explicitrepresentation for the spectra are readily available. Such lack of simple analytic expressionsis not surprising since, even for linear processes, the impact of the linear representation coef-ﬁcients on joint distributions (as opposed to covariances) is bound to be quite complicated,and crucially depends on innovation densities. From a practical point of view, this is not amajor drawback, though, as for any given linear representation very good approximations ofthe copula spectra can be obtained within a few minutes via simulations. The actual copulaspectral densities in (a) were obtained by simulating, for each t in T , R = 1000 indepen-dent replications, all of length 2 , of the strictly stationary approximation ( X t /Tt ) t =1 ,..., ,computing the corresponding lag-window estimators ˆ f rt ,T ( ω, τ , τ ), say, for r = 1 , ..., R , andaveraging them (over r = 1 , ..., R ) for each ﬁxed ( t , ω ) ∈ T × Ω.The estimators in (b)-(f) are computed from one realization, of length T = 2 , of the(nonstationary) process under consideration with a bandwidth B n = 10 and local windowlengths n = 128 , , , , T =2 , are available in Section A.2.3 of the online appendix. Our ﬁndings indicate that, forshorter time-series lengths, estimating ‘fastly changing’ dependence structures may becomediﬃcult. If the changes are very smooth, as in the QAR example of Section 4.2.2, the resultsfor short time series are still reasonable. For K , we used the Parzen window K ( u ) = (1 − u + 6 | u | ) I {| u |≤ . } + 2(1 − | u | ) I { . ≤| u |≤ } .

15n each case, the sets T and Ω were chosen as T := { k | k = (cid:100) n/ (cid:101) , . . . , (cid:98) T − n/ (cid:99)} and Ω := { πj/n | j = 1 , ..., ( n − / } . In Figure 3, we display heatmapss for a time-varying AR(2) process with equation X t,T = 1 . . − cos(2 πt/T )) X t − ,T − . X t − ,T + Z t (4.1)and i.i.d. noise Z t with Cauchy distribution. Its strictly stationary approximation at t = ϑT ,for 0 ≤ ϑ ≤

1, is X ϑt = 1 . . − cos(2 πϑ )) X ϑt − − . X ϑt − + ζ t (4.2)where the ζ t ’s are i.i.d. noise with the same Cauchy density as the Z t ’s.The form of the equation is taken from Dahlhaus (2012), where we replaced the Gaussianinnovations with Cauchy ones, thus violating the moment assumptions of classical spectralanalysis. The resulting process exhibits a time-varying periodicity which is clearly visible inthe heat diagrams associated with the real parts of its time-varying copula spectral densities,displayed in the lower triangular parts of Figures 3(a)-(f). The imaginary parts of thespectra are shown in the upper triangular parts of the same ﬁgures; note that, due to time-irreversibility (see Hallin et al. 1988), those imaginary parts exhibit signiﬁcant yellow regionsin the actual spectral density (a). The peaks are, however, very narrow, thus quite diﬃcultto estimate, and essentially disappear in the estimated versions (b)-(f). The proposed lag-window estimator nevertheless is able to recover the structure of the spectral densities overa broad range of window lengths.Also note the signiﬁcant peak around zero appearing in the diagrams associated withextreme quantiles ( τ , τ = 0 . . n = 128 in ( b ) , makes it very diﬃcult to reconstruct the copula spectral densitiesfor the extreme quantiles ( τ = (0 . , .

1) or τ = (0 . , . n = 2048 in (f)) one. leads to a loss of details.The estimators remain stable, though, over a broad range of window lengths ( n = 256 − Figure 4 shows the same heat diagrams for the QAR(1) (Quantile Autoregression) model oforder one X t,T = [(1 . U t − . t/T ) + ( − . U t + 0 . − ( t/T ))] X t − ,T + ( U t − / , where the U t ’s are i.i.d. uniform over [0 ,

1] (see Koenker and Xiao (2006)). The correspondingstrictly stationary approximation at t = ϑT , 0 ≤ ϑ ≤

1, is X ϑt = [(1 . V t − . ϑ + ( − . V t + 0 . − ϑ )] X ϑt − + ( V t − /

2) (4.3)where the V t ’s are i.i.d. uniform over [0 , . U t − .

95 to − . U t + 0 .

95, so that the spectral densities associatedwith the lower quantiles for small values of t /T are the same as those associated with theupper quantiles for 1 − t /T , and vice versa.This behavior, which cannot be detected via classical spectral methods, is quite visiblehere. Comparing the plots for τ = (0 . , .

1) and τ = (0 . , . ϑ = 0 . n = 2048, and the estimator displays most ofthe details found in the actual spectral density. (a) Actual copula spectral densities (simulated) (b)

Estimated copula spectral densities, n = 128 (c) Estimated copula spectral densities, n = 256 (d) Estimated copula spectral densities, n = 512 (e) Estimated copula spectral densities, n = 1024 (f) Estimated copula spectral densities, n = 2048 Figure 3:

Heatmaps of the Cauchy time-varying AR(2) process described in Section 4.2.1 and thecorresponding estimators, for various window lenghts. a) Actual copula spectral densities (simulated) (b)

Heatmaps of the time-varying QAR(1) process described in Section 4.2.2 and the corre-sponding estimators, for various window lenghts.

We now turn back to the S&P500 index series already considered in the introduction,with T = 12992 daily observations from 1962 through 2013 (diﬀerences of the logarithmsof daily opening and closing prices for about 52 years). We applied the same estima-tion method as above, with the same window function as described in Section 4.2, with19igure 5: Time-frequency heatmaps of the quantile lag-window estimator for the log-returns fromthe S&P500 between 1962-2013 for quantile levels 0 . , . .

9. The vertical axis representsfrequencies (0 < ω/ π < .

5) and the horizontal axis is time (1 ≤ t ≤ bandwidth B n = 25, window length n = 512, and the sets T = {

256 + 256 j | ≤ j ≤ } and Ω = { πj/ | j = 0 , ..., } . Calibration was performed as explained in Section 4.1.The resulting heatmaps are shown in the heatmaps of Figure 5.Whereas the central heatmaps ( τ = τ = 0 .

5) are pretty ﬂat (uniform dark blue) withthe exception of some deviations from white noise behavior limited to the early seventies,the more extreme ones ( τ = τ = 0 . .

9) suggest an alternance of high low-frequencyspectral densities (yellow and red) and perfectly ‘ﬂat’ (dark blue) periods. A closer analysisreveals that those periods of strong dependence in the tails typically correspond to well-identiﬁed crises and booms (see below for details). Another interesting observation is themarked asymmetry between the time-varying spectra associated with the left ( τ = 0 .

1) andright ( τ = 0 .

9) tails, which can be interpreted in terms of prospect theory (see Kahneman20igure 6:

Time-frequency heatmaps of the quantile lag-window estimators for τ = τ ∈ { . , . } (no imaginary parts, thus). The vertical axis represents the frequencies (0 < ω/ π < .

5) andthe horizontal axis is time (1 ≤ t ≤ t , a periodogram is plotted againstfrequencies via the color code provided along the right-hand side of each ﬁgure. and Tversky (1979)). That asymmetry is conﬁrmed by comparing the estimated spectrafor τ = 0 . τ = 0 . τ = τ = 0 . τ = τ = 0 . The τ = τ = 0 . them to estimations using only observations taken after it. None of the pre-crisis curvesindicates any signiﬁcant deviation from white noise, whereas both post-crisis ones do. Theinterpretation is that crises, locally but quite suddenly, produce changes in the dependenciesbetween low returns. Those changes happen after the crisis onset, and thus do not helppredict ing them; as shown by Figure 7, they fade away more slowly than they appeared. Asfor the atypical spectra in the late sixties, they are probably due to the fact that the market,at that time, was much smaller, and less eﬃcient, than nowadays. In addition, some of thepeaks of low-returns spectral densities at low frequencies are not systematically associatedwith any well-identiﬁed crisis. This indicates that, apart from crises, other events can alsoinﬂuence the dependence structure of low returns. Peaks in quantile spectral densities atlow frequencies, indeed, also can be caused by time-varying variances. This fact was ﬁrstobserved by Li (2014), who suggested that this phenomenon could explain some features of22 .0 0.1 0.2 0.3 0.4 0.5 . . . . . After 2001-03-15Before 2001-03-15 0.0 0.1 0.2 0.3 0.4 0.5 . . . . . After 2007-08-09Before 2007-08-09

Figure 8:

Single local lag-window estimators calculated before (dashed) and after (solid) thebursting of the dot-com bubble in 2001 (left) and the beginning of the ﬁnancial crisis in 2007(right); the dotted horizontal lines represent the values of q min and q max from Section 4 . iii );smoothing and bandwidth choices as in Figure 5. the S&P500 spectra. A closer look, however, reveals that it cannot account for all dependencein that dataset: see Section A.3.1 of the online appendix. As mentioned in Section 4.1, the new methodology admits a hypothesis testing interpretation—the null hypothesis being that of strong white noise. In our second dataset, we analyzethe residuals of an ARMA( p, q ) ﬁt to a seasonally adjusted time-series of air temperaturesrecorded at the meteorological station in Hohenpeissenberg (Germany). More precisely, T = 11315 observations of daily mean temperatures were recorded between 1985 and 2015;they are displayed in the upper part of Figure 9. To remove the clearly visible seasonality ,we ﬁrst ﬁt a trigonometric regression model of the form y = c + αx + (cid:88) k =1 β k sin(2 πk/ γ k cos(2 πk/ , where the linear part is used to remove any possible trend. An ARMA(p,q) model with p = 3 and q = 1 (determined via AIC and an inspection of residual autocorrelations, seeCampbell and Diebold (2011) for a similar approach) then was estimated from the residuals.23 ean daily temperature - - - . . . . ACF of the time seriesARMA-residuals - - . . . . . . ACF of the residuals

Figure 9:

First row displays T = 11315 observation of daily mean temperature between 1985and 2015 at the Hohenpeissenberg Meteorological Observatory and its autocorrelation function;the second row shows the residuals of the ARMA(3 ,

1) ﬁt and its autocorrelation function.

The residuals resulting from that second ﬁt and their autocorrelations are shown in the lowerpart of Figure 9. From a L perspective, this successfully captures the bivariate behaviorof the dataset. It its therefore not surprising that the (global) classical spectral density ofthose residuals does not show any signiﬁcant structure (see Figure 10).The quantile spectral analysis of the same dataset leads to a much diﬀerent conclusion.Estimating the quantile spectral densities of the same residuals with B n = 10 , n = 2048 , T = { j | j = 0 , . . . , } , and Ω = { πj/ | j = 0 , . . . , } , we obtain the heatmaps shown in Figure 11. The central heat map ( τ = τ = 0 . .0 0.1 0.2 0.3 0.4 0.5 Classical Spectral Density bandwidth = 0.00517

Figure 10:

Classical spectral density of the ARMA-residuals with pointwise calculated 0 .

95 conﬁ-dence interval . does not vanish. It is interesting to note that this peak is most prominent during the 2003heat wave in Europe, which could indicate a connection with long-term climatic ﬂuctuations.Other signiﬁcant eﬀects, although not as dramatic, are also visible in the heat maps involv-ing more extreme quantiles τ , τ ∈ { . , . } . One interesting observation is the strongasymmetry between the spectra associated with low and high temperatures (quantiles).These results suggest that an ARMA model is far from fully capturing the distributionalfeatures of the data—though it does capture its L dynamics. The analysis performed herereveals clear deviations from white noise, which again cannot be detected by classical spectralanalysis. It also clearly shows an evolution through time of the dependence structure of dailytemperatures. Such ﬁndings are not entirely new, and ARMA-GARCH models have beenproposed for similar datasets: see Campbell and Diebold (2011). It should be emphasized,however, that the residual spectra we observe in this dataset do not correspond to typicalGARCH spectra, which suggests that it might be worthwhile to investigate the validity ofsuch parametric models in greater detail. 25igure 11: Time-frequency heatmaps of the quantile lag-window estimator for the ARMA-residualsof the daily mean temperature between 1985 and 2015 at the Hohenpeissenberg MeteorologicalObservatory for quantile levels 0 . , . .

9. The vertical axis represents frequencies ( ω/ π ,ranging from 0 to 0.5), and the horizontal axis is time (1 ≤ t ≤ In contrast with the many deﬁnitions of local stationarity considered in the literature, whichare based on evolving covariance structures, and classical spectra, or time-varying parametricmodels, local strict stationarity is a purely distributional concept. In this section, we showhow, under additional constraints, those other concepts eventually fall under the umbrellaof our deﬁnition. 26 .1.1 tvMA ( ∞ ) processes. Dahlhaus and Polonik (2006) deﬁne a tvMA( ∞ ) process asadmitting a representation of the form X t,T = µ ( t/T ) + ∞ (cid:88) j = −∞ a t,T ( j ) ξ t − j , (5.1)where ξ t is i.i.d. white noise. This deﬁnition cover a wide range of popular linear time-varyingmodels, such as the tvARMA( p, q ) ones.Consider the following assumptions. (MA1) There exist functions a ( · , j ) and µ ( · ) : (0 , → R withsup t,T | a t,T ( j ) − a ( tT , j ) | ≤ KT l ( j ) , sup u ∈ (0 , (cid:12)(cid:12)(cid:12)(cid:12) ∂a ( u, j ) ∂u (cid:12)(cid:12)(cid:12)(cid:12) ≤ Kl ( j ) , and sup u ∈ (0 , (cid:12)(cid:12)(cid:12)(cid:12) ∂µ ( u ) ∂u (cid:12)(cid:12)(cid:12)(cid:12) ≤ K where K is a ﬁnite constant (not depending on j ) and (cid:80) j /l ( j ) < ∞ . Furthermore,sup u ∈ (0 , (cid:80) ∞ j = −∞ | a ( u, j ) | < ∞ and inf u ∈ (0 , | a ( u, | > ρ > . (MA2) The random variables ξ t have bounded density function f ξ and ﬁnite expectation,and, for some constant C f > f ξ is such that sup x ∈ R | xf ξ ( x ) | ≤ C f . We then have the following result.

Lemma 5.1.

If Assumptions (MA1) and (MA2) hold, the tvMA ( ∞ ) process deﬁned in (5 . is locally strictly stationary in the sense of Deﬁnition 2.1, with stationary approximation X ϑt = µ ( ϑ ) + ∞ (cid:88) j = −∞ a ( ϑ, j ) ζ t − j where the ζ t ’s are i.i.d. copies of the Z t ’s. ( ∞ ) processes. Dahlhaus and Subba Rao (2006) deﬁne a tvARCH( ∞ )process by 27 t,T = σ t,T Z t , where σ t,T = a ( t/T ) + ∞ (cid:88) j =1 a j ( t/T ) X t − j,T , (5.2)where the Z t ’s are i.i.d. random variables with E ( Z t ) = 0 , Var( Z t ) = 1 , and density f. Theyshow that X t,T , if not X t,T itself, has an almost surely well-deﬁned and unique expression inthe set of all causal solutions of (5.2) if the following assumption holds. (ARCH1) The coeﬃcients a j in (5.2) are non-negative and inf u ∈ (0 , a ( u ) > ρ for someconstant ρ > . There exist constants

Q < ∞ , M < ∞ , and 0 < ν <

1, and a positivesequence l ( j ), j ∈ N , such that (cid:80) ∞ j =1 j/l ( j ) < ∞ , andsup u ∈ (0 , a j ( u ) < Q/l ( j ) , Q ∞ (cid:88) j =1 /l ( j ) ≤ (1 − ν ) and | a j ( u ) − a j ( v ) | < M | u − v | /l ( j ) . In general, equation (5 .

2) has no unique solution, as the sign of a solution X t,T associatedwith the almost surely well-deﬁned (under assumption ( ARCH X t,T can be randomlyeither positive or negative. To avoid this non-unicity problem, we require σ t,T in (5.2) to bepositive. More precisely, we impose the following condition. (ARCH2) For some constant C f < ∞ , sup x ∈ R | xf ( x ) | < C f , and σ t,T = (cid:113) σ t,T . We then have the following result.

Lemma 5.2.

If Assumptions (ARCH1) and (ARCH2) hold, a process X t,T satisfying equa-tion (5 . is locally strictly stationary in the sense of Deﬁnition 2.1, with stationary approx-imation X ϑt = σ ϑt ζ t with ( σ ϑt ) = a ( ϑ ) + ∞ (cid:88) j =1 a j ( ϑ )( X ϑt − j ) where the ζ t ’s are i.i.d. copies of the Z t ’s. ( p, q ) processes. Subba Rao (2006) similary deﬁnes a tvGARCH( p, q )28rocess as X t,T = σ t,T Z t , with σ t,T = a ( t/T ) + p (cid:88) j =1 a j ( t/T ) X t − j,T + q (cid:88) i =1 b j ( t/T ) σ t − j,T , (5.3)where Z t are i.i.d. random variables with E ( Z t ) = 0, Var( Z t ) = 1, and density f. Parallel to(ARCH1) and (ARCH2), consider the following assumptions. (GARCH1)

The coeﬃcient functions a j ( u ), j = 0 , . . . , p and b j ( u ), j = 1 , . . . , q , are posi-tive, Lipschitz-continuous, and satisfy, for some 0 < µ < ρ > u ∈ (0 , (cid:104) p (cid:88) j =1 a j ( u ) + q (cid:88) i =1 b j ( u ) (cid:105) < − µ, and inf u ∈ (0 , a ( u ) > ρ. (5.4) (GARCH2) For some constant C f < ∞ , sup x ∈ R | xf ( x ) | ≤ C f , and σ t,T = (cid:113) σ t,T .The following then holds true. Lemma 5.3.

If Assumptions (GARCH1) and (GARCH2) hold, a process X t,T satisfyingequation (5 . is locally strictly stationary in the sense of Deﬁnition 2.1, with stationaryapproximation X ϑt = σ ϑt ζ t , where ( σ ϑt ) = a ( ϑ ) + p (cid:88) j =1 a j ( ϑ )( X ϑt − j ) + q (cid:88) i =1 b j ( ϑ )( σ ϑt − j ) and the ζ t ’s are i.i.d. copies of the Z t ’s. The proofs of Lemmas 5.1-5.3 can be found in Section A.4.1 of the online appendix.

Let (Ω , A , P ) denote a probability space, and let B , and C be subﬁelds of A . Deﬁne β ( B , C ) := E sup C ∈ C | P ( C ) − P ( C | B ) | and, for an array { Z t,T : 1 ≤ t ∈ Z } , let 29 ( k ) := sup T sup t ∈ Z β ( σ ( Z s,T , s ≤ t ) , σ ( Z s,T , t + k ≤ s )) , where σ ( Z ) is the σ − ﬁeld generated by Z. Recall that a process is called β -mixing or abso-lutely regular if β ( k ) → k → ∞ . Before proceeding with the asymptotic properties of ˆ f t ,T ( ω, τ , τ ), we are collecting heresome technical assumptions needed in the sequel.(K) The lag-window function K in (3 .

1) satisﬁes (cid:107) K (cid:107) ∞ ≤ K (0) = 1 and has sup-port [ − , R is d times continuously diﬀerentiable with d ≥ K has ‘characteristic exponent’ r > r is the largest integer such that C K ( r ) := lim u → (cid:0) − K ( u ) (cid:1) / | u | r exists, is ﬁnite and non-zero.(A1) The triangular array { X t,T } is β -mixing with mixing coeﬃcients β [ X ] ( k ) = O ( k − δ ) forsome δ >

1. The same holds for { X ϑt } .(A2) For any ε > ρ n ( ε ) := (cid:0) ε + n / (1+ δ ) ε n log( n ) (cid:1) / ∨ ( n − δ/ (1+ δ ) log( n )) , with δ as inAssumption (A1). Assume that ρ n ( T − / ) = o (( nB n ) − / ), T − / = o ( B / n /n / ) and B n n + n T B n + n / B n T = o (cid:16) B / n n / (cid:17) . (A3) (i) For some γ > T − γ − / γ = o ( B / n n − / ) , sup t sup x,y (cid:12)(cid:12)(cid:12) F t,t + k ; T ( x, y ) − F t ; T ( x ) F t + k ; T ( y ) (cid:12)(cid:12)(cid:12) = O ( | k | − γ ) . (ii) (cid:80) k ∈ Z | k | r sup u,η ,η | γ uk ( η , η ) | < ∞ , where the supremum is over a neighborhoodof ( ϑ, q ϑ ( τ ) , q ϑ ( τ )).(iii) For 2 ≤ p ≤

8, deﬁne κ p ( s , ..., s p − ) := sup T sup t ∈N t ,T sup x ,...,x p | cum( I { X t,T ≤ x } , I { X t + s ,T ≤ x } , ..., I { X t + sp − ,T ≤ x p } ) | κ ϑp ( s , ..., s p − ) := sup x ,...,x p | cum( I { X ϑ ≤ x } , I { X ϑs ≤ x } , ..., I { X ϑsp − ≤ x p } ) | ;30ssume moreover that the quantities κ p ( s , ..., s p − ) and κ ϑp ( s , ..., s p − ) are abso-lutely summable over s , ..., s p − ∈ Z .(A4) (i) The joint distribution functions F t ,...,t j ; T of ( X t ; T , ..., X t j ; T ) , j = 2 , ..., t , ..., t j , T and the arguments. The distribution func-tion F t ; T is twice continuously diﬀerentiable and its derivatives are bounded uni-formly in t and T .(ii) Let d ( r ) ω f u ( ω, x, y ) := π (cid:80) k ∈ Z | k | r γ uk ( x, y ) e − i kω , where r is taken from Assump-tion (K). The function ( u, x, y ) (cid:55)→ d ( r ) ω f u ( ω, x, y ) is continuous in a neighborhoodof ( u, x, y ) = ( ϑ, q ϑ ( τ ) , q ϑ ( τ )).(iii) The function u (cid:55)→ f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) is twice continuously diﬀerentiablein a neighborhood of u = ϑ .(iv) The functions G uk and G u in the deﬁnition of local strict stationarity are, forsome d ≥ , d times continuously diﬀerentiable with respect to u. The function G ϑ has a density, which is uniformly bounded away from zero on an open set thatcontains q ϑ ( τ ) and q ϑ ( τ ). Remark 5.1.

Assumptions (K) and (A2) place mild restrictions on the lag-window genera-tor K and the bandwidth parameter, respectively. One can show that they are satisﬁed bythe bandwidth parameters leading to optimal asymptotic MSE rates for the mean squarederror, see the discussion in Remark 5.3 for more details. Assumption (A3) is veriﬁed if δ in (A1) is large enough: in fact, it is suﬃcient to replace the β -mixing coeﬃcients in (A1)by α -mixing coeﬃcients (see Lemma A.5.1 in the online Appendix for additional details onbounding cumulants through α -mixing coeﬃcients). Assumption (A4) places conditions onthe smoothness properties of the underlying processes which rule out processes with jump-like non-stationarity. 31 emark 5.2. Observe that f u (cid:0) ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ) (cid:1) = 12 π (cid:88) k ∈ Z e − ikω (cid:0) G uk ( q ϑ ( τ ) , q ϑ ( τ )) − G u ( q ϑ ( τ )) G u ( q ϑ ( τ )) (cid:1) . This means that the diﬀerentiability of u (cid:55)→ f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) depends on thelocal smoothness with respect to time of joint distributions.Our main result states that, after proper centering and scaling, ˆ f t ,T ( ω, τ , τ ) is asymp-totically (complex) normal. Theorem 5.1.

Let Assumptions (K) and (A1)-(A4) hold. Then, for any sequence ω n ofFourier frequencies such that | ω n − ω | = O (1 /n ) for some ω ∈ (0 , π ) , and for t = (cid:98) T ϑ (cid:99) , (cid:112) n/B n (cid:32) (cid:60) ˆ f t ,T ( ω n , τ , τ ) − (cid:60) f ϑ ( ω, τ , τ ) − (cid:60) b ( ω, τ , τ ) (cid:61) ˜ f t ,T ( ω n , τ , τ ) − (cid:61) f ϑ ( ω, τ , τ ) − (cid:61) b ( ω, τ , τ ) (cid:33) D −→ N (cid:16) , Σ ( ω, τ , τ ) (cid:17) (5.5) where Σ ( ω, τ , τ ) := π f ϑ ( ω, τ , τ ) f ϑ ( ω, τ , τ ) (cid:90) K ( u ) du (cid:16) (cid:17) and b ( ω, τ , τ ) := − C K ( r ) B − rn d ( r ) ω f ϑ ( ω, τ , τ )+ n T ∂ ∂u f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) (cid:12)(cid:12)(cid:12) u = ϑ + o ( B − rn + n T ) . Remark 5.3.

Theorem 5.1 implies consistency of the estimator, which, however, holdsunder weaker assumptions. The same theorem also can be used to obtain the local windowlength n and the bandwidth parameter B n that minimize the asymptotic mean squared errorof ˆ f t ,T ( ω n , τ , τ ). To illustrate the idea, assume that we want to optimize the asymptoticmean squared error (MSE) of (cid:60) ˆ f t ,T ( ω n , τ , τ ). Considering r = d , let σ := Σ ( ω, τ , τ ) ,b u := 12 ∂ ∂u f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) (cid:12)(cid:12)(cid:12) u = ϑ and b ω := − C K ( r ) d ( r ) ω f ϑ ( ω, τ , τ ) .

32n this notation the asymptotic MSE of (cid:60) ˆ f t ,T ( ω n , τ , τ ) is B n n σ + b u n T + b ω B − rn . Assumingthat b u (cid:54) = 0 and b ω (cid:54) = 0, straightforward calculations entail that this MSE is minimized for n = T r r (cid:0) σ b − /rω b /ru (2 r + 4) (cid:0) r (cid:1) − rr +1 (cid:1) r r , B n = T r (cid:0) σ − b − / u b / ω (2 r + 4) (cid:1) r (cid:0) r (cid:1) − r . As one would expect, larger values of r, corresponding to smoother local spectral densities(as functions of frequency), lead to more smoothing and faster convergence rates. For r = 2,the asymptotic MSE of ˆ f t ,T ( ω n , τ , τ ) turns out to be of the order T − / . One can show that,if the constant δ in Assumption (A1) is large enough, the above choices of the bandwidthparameters satisfy condition (A2) if r ≥

2. The above formulas provide rough guidelinesabout the choice of smoothing parameters. However, the expression for the bias containsunknown parameters, such as derivatives of the local copula spectral density kernel, whichare diﬃcult to estimate in practice.

In this paper, we have deﬁned local copula spectra using a new notion of local strict station-arity; we have constructed a lag-window type estimator and proved its asymptotic normality.In a stationary context, it has been shown that copula-based spectra provide a descriptionof serial dependence structures which is substantially more informative and ﬂexible thanclassical covariance-based spectra. The beneﬁts of this new spectral methodology are thusextended to slowly evolving dependence structures. Those beneﬁts are highlighted in a sim-ulation study and by analyzing two datasets, the daily log-returns of the classical S&P500series and a meteorological series recorded in Hohenpeissenberg. That analysis indeed re-veals a number of interesting features that cannot be detected by a more traditional L -basedapproach.Several important questions are calling for further research, though. Our method requiresthe choice of a smoothing parameter—an issue which is common to all methods based onlocal stationarity concepts. It seems important to have some data-driven procedure provid-33ng general guidelines on what a ‘good’ choice of smoothing parameters is. An interestingapproach to this problem has been suggested recently by Cranstoun et al. (2002), and animportant direction for future research is the extension of this method to the present set-ting. It also is important to develop methods for uniformly (in frequency and local time)valid statistical inference on quantile spectra. This is challenging, and to the best of ourknowledge such methodology for the simpler case of classical L spectra has only recentlybeen developed by Liu and Wu (2010) in the stationary case. Finally, it is natural to assumethat the dependence structure of a time series contains both smooth changes and suddenjumps. In the present paper, we have dealt with smooth changes only, and an extensionaccommodating possible jumps would be most welcome. For example, in an L setting, suchsmoothness assumptions could be avoided using the piecewise locally stationary concept ofZhou (2013) or by considering the evolutionary wavelet spectra as described in Van Bel-legem and Von Sachs (2008). Extending the distributional approach described in this paperto piecewise locally stationary processes or wavelet-based spectra (even in a strictly station-ary case) is an interesting and challenging direction for future research. Acknowledgements.

We gratefully acknowledge the many suggestions and constructivecomments by two Referees, an Associate Editor and an Editor, that helped improving theoriginal version of this paper.

References

Azrak, R. and M´elard, G. (2006). Asymptotic properties of quasi-maximum likelihood estimatorsfor ARMA models with time-dependent coeﬃcients.

Statistical Inference for Stochastic Processes ,9:279–330.Bougerol, P. and Picard, N. (1992). Stationarity of GARCH processes and some nonnegative timeseries.

Journal of Econometrics , 52:115–127.Briggs, W. L. and Henson, V. E. (1995).

The DFT. An Owners Manual for the Discrete FourierTransform . SIAM. Society for Industrial & Applied Mathematics. rillinger, D. R. (1975). Time Series. Data Analysis and Theory . Holt, Rinehart and Winston.Brockwell, P. and Davis, R. (1998).

Time Series: Theory and Methods . Springer.Campbell, S. D. and Diebold, F. X. (2011). Weather forecasting for weather derivatives.

Journalof the American Statistical Association , 100:6–16.Cranstoun, S. D., Ombao, H. C., Von Sachs, R., Guo, W., and Litt, B. (2002). Time-frequencyspectral estimation of multichannel eeg using the auto-slex method.

IEEE Transactions onBiomedical Engineering , 49:988–996.Dahlhaus, R. (1997). Fitting time series models to nonstationary processes.

Annals of Statistics ,25:1–37.Dahlhaus, R. (2012). Locally stationary processes. In Rao, T. S., Rao, S. S., and Rao, C., editors,

Time Series Analysis: Methods and Applications , volume 30 of

Handbook of Statistics , pages 351– 413. Elsevier.Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi-maximum likelihood estimation forgaussian locally stationary processes.

Annals of Statistics , 34:2790–2824.Dahlhaus, R. and Subba Rao, S. (2006). Statistical inference for time-varying ARCH processes.

Annals of Statistics , 34:1075–1114.Davis, R. A., Lee, T., and Rodriguez-Yam, G. (2005). Structural break estimation for nonstationarytime series models.

Journal of the American Statistical Association , 101:223–239.Davis, R. A. and Mikosch, T. (2009). The extremogram: A correlogram for extreme events.

Bernoulli , 15:977–1009.Davis, R. A., Mikosch, T., and Zhao, Y. (2013). Measures of serial extremal dependence and theirestimation.

Stochastic Processes and their Applications , 123:2575–2602.Dette, H., Hallin, M., Kley, T., and Volgushev, S. (2015). Of copulas, quantiles, ranks and spectra:An L approach to spectral analysis. Bernoulli , 21:781–831.Fryzlewicz, P., Sapatinas, T., and Subba Rao, S. (2008). Normalised least-squares estimation intime-varying ARCH models.

Annals of Statistics , 36:742–786.Fryzlewicz, P. and Subba Rao, S. (2014). Multiple-change-point detection for auto-regressive con-ditional heteroscedastic processes.

Journal of the Royal Statistical Society Ser. B , 76:903–924.Giraitis, L., Kokoszka, P., and Leipus, R. (2000). Stationary ARCH models: Dependence structureand central limit theorem.

Econometric Theory , 16:3–22. agemann, A. (2013). Robust spectral analysis. Available at arXiv:1111.1965v2.Hallin, M., Lef`evre, C., and Puri, M. L. (1984). On time-reversibility and the uniqueness of movingaverage representations for non-gaussian stationary time series. Biometrika , 75:170–171.Han, H., Linton, O., Oka, T., and Whang, Y.-J. (2014). The cross-quantilogram: Measuringquantile dependence and testing directional predictability between time series. Available atarXiv:1402.1937v2.Harvey, A. C. (2010). Tracking a changing copula.

Journal of Empirical Finance , 17:485–500.Hong, Y. (1999). Hypothesis testing in time series via the empirical characteristic function: Ageneralized spectral density approach.

Journal of the American Statistical Association , 94:1201–1220.Hong, Y. (2000). Generalized spectral tests for serial dependence.

Journal of the Royal StatisticalSociety Ser. B , 62:557–574.Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision under risk.

Econometrica , 47:263–291.Kley, T., Volgushev, S., Dette, H., and Hallin, M. (2016). Quantile spectral processes: Asymptoticanalysis and inference.

Bernoulli , 22:1770–1807.Koenker, R. and Xiao, Z. (2006). Quantile autoregression.

Journal of the American StatisticalAssociation , 101:980–1006.Kosorok, M. R. (2007).

Introduction to Empirical Processes and Semiparametric Inference . SpringerScience & Business Media.Lee, J. and Subba Rao, S. (2012). The quantile spectral density and comparison based tests fornonlinear time series. Available at arXiv:1112.2759v2.Li, T.-H. (2008). Laplace periodogram for time series analysis.

Journal of the American StatisticalAssociation , 103:757–768.Li, T.-H. (2012). Quantile periodograms.

Journal of the American Statistical Association , 107:765–776.Li, T.-H. (2014). Quantile periodogram and time-dependent variance.

Journal of Time SeriesAnalysis , 35:322–340.Linton, O. and Whang, Y.-J. (2007). The quantilogram: with an application to evaluating direc-tional predictability.

Journal of Econometrics , 141:250–282. iu, W. and Wu, W. B. (2010). Asymptotics of spectral density estimates. Econometric Theory ,26:1218–1245.Martin, W. and Flandrin, P. (1985). Wigner-Ville spectral analysis of nonstationary processes.

IEEE Transactions on Acoustics, Speech, and Signal Processing , 33:1461 – 1470.Nason, G. (2013). A test for second-order stationarity and approximate conﬁdence intervals forlocalized autocovariances for locally stationary time series.

Journal of the Royal Statistical SocietySer. B , 75:879–904.Nason, G. P., Von Sachs, R., and Kroisandt, G. (2000). Wavelet processes and adaptive estimationof the evolutionary wavelet spectrum.

Journal of the Royal Statistical Society Ser. B , 62:271–292.Parzen, E. (1957). On consistent estimates of the spectrum of a stationary time series.

The Annalsof Mathematical Statistics , 28:329–348.Priestley, M. B. (1965). Evolutionary spectra and non-stationary processes.

Journal of the RoyalStatistical Society Ser. B , 27:204–237.Priestley, M. B. (1981).

Spectral Analysis and Time Series . Academic Press.Rosenblatt, M. (1984). Asymptotic normality, strong mixing and spectral density estimates.

Annalsof Probability , 12:1167–1180.Roueﬀ, F. and Von Sachs, R. (2011). Locally stationary long memory estimation.

StochasticProcesses and their Applications , 121:813–844.Schuster, A. (1898). On the investigation of hidden periodicities with application to a supposed 26day period of meteorological phenomena.

Terrestrial Magnetism , 3:13–41.Subba Rao, S. (2006). On some nonstationary, nonlinear random processes and their stationaryapproximations.

Advances in Applied Probability , 38:1155–1172.Subba Rao, T. (1970). The ﬁtting of non-stationary time series models with time-dependentparameters.

Journal of the Royal Statistical Society Ser. B , 32:312–322.Van Bellegem, S. and Von Sachs, R. (2008). Locally adaptive estimation of evolutionary waveletspectra.

Annals of Statistics , 36:1879–1924.Vogt, M. (2012). Nonparametric regression for locally stationary time series.

Annals of Statistics ,40:2601–2633.Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences.

Annalsof Probability , 22:94–116. hou, Z. (2013). Heteroscedasticity and autocorrelation robust structural change detection. Journalof the American Statistical Association , 108:726–740.Zhou, Z. and Wu, W. B. (2009a). Local linear quantile estimation for nonstationary time series.

Annals of Statistics , 37:2696 – 2729.Zhou, Z. and Wu, W. B. (2009b). Nonparametric inference of discretely sampled stable L´eevyprocesses.

Annals of Statistics , 153:83–92. nline Appendix In this online appendix, we collect (Section A.1) some additional information on the spectralconcept considered here, (Section A.2) some additional simulation results, (Section A.3)some further analysis of the S&P500 data, and (Section A.1) (Sections A.4-A.6) the proofsof the main results, along with some technical details.

A.1 A connections with the Wigner-Ville spectra

A further theoretical justiﬁcation for the time-varying copula spectral density kernels consid-ered in this paper is their relation to the so-called Wigner-Ville spectrum. The Wigner-Villespectrum (in its classical L version) is based on the so-called Wigner distribution of a pro-cess of the form { X t,T } and has its origins in quantum mechanics. It was used later on inthe signal processing community. Its properties for time-varying spectral analysis have beeninvestigated in Martin and Flandrin (1985).For the series of indicators we are dealing with here, the Wigner-Ville spectrum takesthe form W t ,T ( ω, τ , τ ) := ∞ (cid:88) s = −∞ Cov (cid:16) I { X (cid:98) t s/ (cid:99) ,T ≤ F − (cid:98) t s/ (cid:99) ; T ( τ ) } , I { X (cid:98) t − s/ (cid:99) ,T ≤ F − (cid:98) t − s/ (cid:99) ; T ( τ ) } (cid:17) e − iωs π (A.1)(see Martin and Flandrin (1985)).The following proposition establishes a strong relation between our time-varying cop-ula spectral density kernels f ϑ ( ω, τ , τ ), as deﬁned in (2 . W t ,T ( ω, τ , τ ). Proposition A.1.1.

Let { X t,T } be locally strictly stationary, with approximating processes { X ϑt } , and assume that Assumption (A1) holds. If moreover the γ ϑh ( τ , τ ) ’s are absolutelysummable for any ϑ and ( τ , τ ) ∈ (0 , , then, for any ﬁxed ϑ and ( τ , τ ) ∈ (0 , , alongany sequence t = t ( T ) such that t /T → ϑ , ω ∈ ( − π,π ] (cid:12)(cid:12)(cid:12) f ϑ ( ω, τ , τ ) − W t ,T ( ω, τ , τ ) (cid:12)(cid:12)(cid:12) = o (1) , where W t ,T denotes the indicator Wigner-Ville spectrum deﬁned in (A.1) . Proof.

From the absolute summability of γ ϑh ( τ , τ ) , we obtain f ϑ ( ω, τ , τ ) = 12 π T / (cid:88) h = − T / γ ϑh ( τ , τ ) e − iωh + o (1) , while Assumption (A1) yields W t ,T ( ω, τ , τ )= 12 π T / (cid:88) h = − T / (cid:16) F (cid:98) t − h/ (cid:99) , (cid:98) t + h/ (cid:99) ; T ( F − (cid:98) t − h/ (cid:99) ; T ( τ ) , F − (cid:98) t + h/ (cid:99) ; T ( τ )) − τ τ (cid:17) e − iωh + o (1) . Writing the diﬀerence between the leading terms in f ϑ ( ω, τ , τ ) and W t ,T ( ω, τ , τ ) in termsof distribution functions yields12 π T / (cid:88) h = − T / | F (cid:98) t − h/ (cid:99) , (cid:98) t + h/ (cid:99) ; T ( F − (cid:98) t − h/ (cid:99) ; T ( τ ) , F − (cid:98) t + h/ (cid:99) ; T ( τ )) − G ϑh ( q ϑ ( τ ) , q ϑ ( τ )) |≤ π T / (cid:88) h = − T / Lg min (cid:12)(cid:12)(cid:12) hT + 1 T (cid:12)(cid:12)(cid:12) = o (1) , where the last inequality follows from Lemma A. . . The claim follows.For more information about the Wigner-Ville spectrum, its properties and applications,see Martin and Flandrin (1985).

A.2 Additional Simulations

A.2.1 Gaussian tvAR(2)

In Figure 12, we display, for a classical Gaussian time-varying AR(2) process, the same heatmaps as we did in Section 4.2; in particular, part (a) was obtained along the same lines asdescribed there. 40 a) Actual copula spectral densities (simulated) (b)

Heatmaps of the Gaussian time-varying AR(2) process described in Section A.2.1 andthe corresponding estimators, for various window lenghts.

The model equation, taken from Dahlhaus (2012), is X t,T = 1 . . − cos(2 πt/T )) X t − ,T − . X t − ,T + Z t (A.2)with i.i.d. noise Z t ∼ N (0 , t = ϑT , 0 ≤ ϑ ≤

1, is41 ϑt = 1 . . − cos(2 πϑ )) X ϑt − − . X ϑt − + ζ t (A.3)where ζ t similarly is N (0 ,

1) white noise. This tvAR(2) process exhibits a time-varyingperiodicity which is clearly visible in the heat diagrams associated with the real parts ofits time-varying copula-based spectral densities, displayed in the lower triangular part ofFigure 12(b). The uniformly dark blue imaginary parts in the upper triangular part are aconsequence of the fact that those imaginary parts actually are zero, since Gaussian processesare time-reversible [see Proposition 2.1 in Dette et al. (2015)]. As expected, no additionalinformation can be gained from observing diﬀerent quantiles (all heatmaps in the lower-triangular parts of (a) are the same), since the (bivariate) distributions of the process areGaussian and the change over time only aﬀects the correlation of these conditional distri-butions. Because of this the (time-varying) bivariate distribution functions (and with themall quantiles) depend only on the (time-varying) correlations of the random variables, whichare also fully captured by L methods. A.2.2 Gaussian tvARCH(1)

Figure 13 displays the same heatmaps for a time-varying ARCH(1) model of the form X t,T = (cid:113) / . t/T ) X t − ,T Z t with i.i.d. noise Z t ∼ N (0 ,

1) and its strictly stationary approximation at time t = ϑT ,0 ≤ ϑ ≤ X ϑt = (cid:113) / . ϑ ( X ϑt − ) ζ t where ζ t similarly is N (0 ,

1) white noise. In these stationary approximations, the inﬂuenceof X ϑt − on the variance of X ϑt gradually increases over time. This, quite understandably,gets reﬂected in the diagrams associated with extreme quantiles, but is not visible in the“median ones”. 42 a) Actual copula spectral densities (simulated) (b)

Heatmaps of the Gaussian time-varying ARCH(1) process described in Section A.2.2and the corresponding estimators, for various window lenghts. .2.3 Inﬂuence of the series length As suggested by a referee we want to include heatmaps of estimators calculated from shortertime series T . In our non-stationary setting, a smaller T is essentially equivalent to afaster evolution of the features of the process under study. Figure 14 compares estimatorsfor the Cauchy tvAR(2) and the tvQAR(1) processes (studied in Section 4 . .

2) for serieslengths T = 2 = 4096 and T = 2 = 2048 (one single realization), bandwidth B n = 10and window length w = 512. The results indicate that estimation rapidly deteriorates withdecreasing T . While nonstationarity nevertheless remains quite signiﬁcant in the CauchytvAR(2) case, the signal in the real parts for the slowly varying tvQAR(1) is barely visible;time-irreversibility, on the other hand, remains well detected. (a) Time-varying Cauchy AR(2) with T = 4096 (b) Time-varying Cauchy AR(2) with T = 2048 (c) Time-varying QAR(1) with T = 4096 (d) Time-varying QAR(2) with T = 2048 Figure 14:

Heatmaps of estimated spectra for the time-varying Cauchy AR(2) and QAR(1) pro-cesses, T = 2 = 4096 and T = 2 = 2048. .3 Time-varying variances and low frequency peaksin quantile spectra As mentioned in Section 4 . A.3.1 A tvARCH(0) model approach for S&P500 log-returns

A plot of the local variances of S&P500 returns (estimated in local windows) against time(Figure 16) suggests that those variances can be considered roughly stable over periods of atmost 100 observations. This is quite small compared to the window length needed to estimatea quantile spectrum, a mismatch that could produce spurious deviations from white noisebehavior in the heatmaps. To investigate whether such fast marginal changes can indeedproduce the type of quantile spectral plots associated with the S&P500 data, we followed aheuristic approach inspired by Li (2014). More precisely, based on local windows of length 101(details are provided in Section A.3.2), we ﬁrst computed estimators ˆ σ t , t = 1 , ..., σ t . With those estimated variances, we constructed an artiﬁcialseries (of the tvARCH(0) form) X t,T = ˆ σ t Z t , (A.4)where { Z t } t =1 denote i.i.d. draws with replacement from the S&P500 values Y t standard-ized by their estimated standard deviation, i.e. from { Y t / ˆ σ t } t =1 where ( Y t ) t =1 denote theobserved S&P500 returns. To see how well the time-varying copula spectrum of the S&P500returns can be matched by the spectrum of the process X t,T , we simulated J = 1000 in-45ependent copies of X t,T and, for each realization, we computed local estimators of thequantile spectrum, ˆ f jt ,T , say, j = 1 , ..., J . Out of those J = 1000 realizations, we se-lected one that produces quantile spectra matching those of the S&P500 return—see A.3.2for details. For the sake of brevity, we restrict our comparison to the real parts of thequantile combinations ( τ , τ ) ∈ { (0 . , . , (0 . , . } and the imaginary parts correspond-ing to ( τ , τ ) ∈ { (0 . , . } . The corresponding time-frequency plots are shown in Fig-ure 15. Comparing the ﬁrst two row ﬁgures corresponding to the quantile levels (0 . , . Time-frequency heatmaps for the S&P500 log-returns (left column) and one realizationof a tvARCH(0) (right column) process where the parameter is estimated as the time-varyingvariance of the S&P500. First row: τ = τ = 0 .

1. Second row: τ = τ = 0 .

9. Third row:imaginary parts ( τ = 0 . , τ = 0 . and (0 . , . X t,T indeedproduces some peaks at low frequencies. However, those peaks, in the (0 . , . . , .

9) S&P500 spectra (third row in Figure Figure 15) shows signiﬁcant devia-tions from white noise behavior, in particular during the periods 1965–1973 and 1995–2003.In contrast to that, the corresponding imaginary parts for the time-varying process X t,T are virtually indistinguishable from those of a white noise spectrum. This indicates that atvARCH(0) model does not provide an adequate description of the joint distributions of highand low returns and additional evidence that a tvARCH(0) model fails to capture importantaspects of the S&P500 dynamics. A.3.2 The “best tvARCH (0) ﬁt” of the S&P500 data

Let us provide some details on the heuristic approach adopted in Section A.3.1. The time-varying variances σ t of the log-returns Y , . . . , Y T were estimated byˆ σ t = 12 n + 1 T (cid:88) s =1 ¯ K (cid:0) s − tn (cid:1)(cid:16) Y t − n + 1 (cid:88) | l − t |≤ n Y l (cid:17) where ¯ K is the Parzen window multiplied with 4 / n = 50(accomodating fast changes that are still suﬃciently smooth—see Figure 16). Note that,to compute ˆ σ t for t = 1 , . . . , , we need 13092 observations of the S&P500 from 1962-01-02 to 2014-01-03. Those additional observations were omitted in the rest of the paperto keep all pictures consistent. For our investigation, we created J = 1000 artiﬁcial timeseries as deﬁned in ( A.

4) and for each of them, we calculated a collection of local lag-windowestimators ˆ f jt ,T ( ω, τ , τ ) , j = 1 , ..., J by using the same windows and bandwidths as forthe log-returns of the S&P500. To select the “best match”, we concentrate on the realparts of ˆ f jt ,T ( ω, . , .

1) and ˆ f jt ,T ( ω, . , . , and the imaginary part of ˆ f jt ,T ( ω, . , . . Let ˆ f t ,T ( ω, τ , τ ) stand for the local lag-window estimator of the S&P500 log-returns, and47onsider the L distances d ( j ) = (cid:88) ω ∈ Ω (cid:88) t ∈T [ (cid:60) ˆ f jt ,T ( ω, . , . − (cid:60) ˆ f t ,T ( ω, . , . , j = 1 , . . . , J. Denote by d ( j ) and d ( j ) the same L distances computed for the real and imaginary partsof ˆ f jt ,T ( ω, . , .

9) and ˆ f jt ,T ( ω, . , .

1) respectively. The “best match” was selected as therealization j min minimizing the sum d ( j ) + d ( j ) + d ( j ) . Figure 16:

Estimated time-varying variance of the log-returns of the S&P500.

Figure 17:

Log-returns of the S&P500 between 1963 and 2013 and the simulated tvARCH(0)process that was selected as the “best match”. .3.3 Comparing the “global” tvARCH(0) and S&P500 spectra If a tvARCH(0) approach to the study of the S&P500 data is to be adopted, one may arguethat, in view the invariance argument mentioned at the beginning of this section, the localizedspectral analysis developed in this paper is not required, and that the stationary or “global”methods developed in Dette et al. (2015) are the appropriate ones. The right check for theadequacy of a tvARCH(0) model then should be based on a comparison between (estimated)stationary quantile spectra, which avoids the trouble caused by a possible mismatch betweenthe pace of marginal changes and the chosen window length.Accordingly, in this section, we provide a comparison of the “global” spectra, i.e. spectracomputed from the complete dataset, without localization, as in Dette et al. (2015), of theS&P500 returns on one hand, of the process X t,T deﬁned in equation (A.4) on the otherhand. We simulated J = 1000 independent replications of the process X t,T , and for eachreplication we computed the “global” lag-window estimator based on N t ,T := { , . . . , T } ,the bandwidth B n = 25 and the same lag-window function as in the analysis of the S&P500log returns. This yields a collection of estimators ( ˆ f j ) j =1 ,...,J . Next, for each frequency ω in { πj | j = 0 , . . . , } and each couple τ , τ in { . , . , . } , we computed the 1% quan-tile q (cid:60) min ( ω, τ , τ ) of the J -tuple ( (cid:60) ˆ f j ( ω, τ , τ )) j =1 ,...,J and the 99% quantile q (cid:60) max ( ω, τ , τ )of the J -tuple ( (cid:60) ˆ f j ( ω, τ , τ )) j =1 ,...,J . The quantiles q (cid:61) max ( ω, τ , τ ) and q (cid:61) max ( ω, τ , τ ) werecomputed similarly.Quantile spectra computed from the S&P500 dataset (in black) are depicted in Figure 18,together with the estimators ˆ f j ( ω, τ , τ ) , j = 1 , ...,

10 (red lines) and, for each ( ω, τ , τ ),a gray area covering the interval [ q min , q max ] . As predicted by the analysis of Li (2014),the τ = τ = 0 . τ = τ = 0 . τ = τ = 0 . τ = τ = 0 .

9. Additionally, the estimators (cid:60) ˆ f ( ω, τ , τ ) for ( τ , τ ) = (0 . , . . , . (cid:61) ˆ f ( ω, τ , τ ) for ( τ , τ ) = (0 . , . , (0 . , .

9) and (0 . , .

9) lie well outside the gray “con-ﬁdence” areas for a wide range of frequencies. This again indicates that the tvARCH(0)49igure 18:

A comparison of global lag-window estimations of the quantile spectra of the S&P500(over the period 1962-2013; black lines) and those obtained from simulated tvARCH(0) processes(red lines, with grey-shaded pointwise conﬁdence regions). process does not provide an adequate description of the (global) dynamic features of theS&P500 returns.It is natural to wonder whether the mismatch between this simple time-varying vari-ance model and the S&P500 data is due to a structural diﬀerence between the S&P500dynamics back in the seventies and its more recent dynamics. To address this question,we replicated the procedure described above for the S&P500 returns in the time period2000-2013. The corresponding results are displayed in Figure 19. While the estimators ofthe (0 . , .

1) and (0 . , .

9) spectra now (just barely) lie within the gray “conﬁdence” areas,we still observe highly signiﬁcant deviations between the imaginary part of the ARCH(0)and S&P500 spectra for the quantile combination ( τ , τ ) = (0 . , . τ , τ ) = (0 . , . Same as Figure 18, with S&P500 observations restricted to the period 2000-2013. cannot be explained by a model based solely on fast local changes in variance. Note that moresophisticated models such as time-varying ARCH and GARCH processes have been suggestedto describe ﬁnancial data, see Fryzlewicz and Subba Rao (2014) for a recent contribution.It would be very interesting to compare the quantile spectra of such time-varying stochasticvolatility processes with those of the S&P500 returns. Preliminary comparisons indicatethat the structure of the imaginary parts of the S&P500 time series cannot be explained bytime-varying GARCH models. 51 .4 Proofs for the main results

A.4.1 Proofs of the results in Section 5.1

We begin by some preliminary comments. Assume that X t,T and X ϑt are deﬁned on the sameprobability space. Expressing distributions function in terms of expectations and indicatorsleads to | F t ,t ; T ( x , x ) − G ϑt − t (( x , x )) | = | E ( (cid:89) ≤ j ≤ I { X tj,T ≤ x j } − (cid:89) ≤ j ≤ p I { ( X ϑtj ) ≤ x j } ) |≤ (cid:88) k =1 | E ( k − (cid:89) j =1 I { X tj,T ≤ x j } [ I { X tk,T ≤ x k } − I { ( X ϑtk ) ≤ x k } ] p (cid:89) j = k +1 I { ( X ϑtj ) ≤ x j } ) |≤ (cid:88) k =1 E ( | I { X tk,T ≤ x j } − I { ( X ϑtk ) ≤ x j } | ) . Therefore, in order to prove Lemma 5.1-5.3, it is suﬃcent to show thatsup x ∈ R E ( | I { X t,T ≤ x } − I { ( X ϑt ) ≤ x } | ) ≤ L (cid:0)(cid:12)(cid:12) tT − ϑ (cid:12)(cid:12) + 1 T (cid:1) . A.4.1.1 Proof of Lemma . M A ϑ ∈ (0 , (cid:80) ∞ j = −∞ | a ( ϑ, j ) | < ∞ , which by standard argu-ments implies strict stationarity of the process X ϑt (see for example Proposition 3 . . µ ( ϑ ) = 0. Inorder to establish distributional properties, we always can specialize the noise ζ t driving X ϑt –an arbitrary copy of the noise Z t driving X t,T —as being Z t itself. Denoting by A the σ − ﬁeldgenerated by { ξ s | s (cid:54) = 0 } , sup x ∈ R E ( | I { X t,T ≤ x } − I { ( X ϑt ) ≤ x } | ) 52 sup x ∈ R E (cid:2) E ( | I { X t,T ≤ x } − I { X ϑt ≤ x } | (cid:12)(cid:12) A ) (cid:3) ≤ sup x ∈ R E (cid:2) E ( | I { ξ t ≤ at,T (0) { x − (cid:80) j (cid:54) =0 a t,T ( j ) ξ t − j }} − I { ξ t ≤ a ( ϑ, { x − (cid:80) j (cid:54) =0 a ( ϑ,j ) ξ t − j }} | (cid:12)(cid:12) A ) (cid:3) = sup x ∈ R E [ | F ξ ( 1 a t,T (0) { x − (cid:88) j (cid:54) =0 a t,T ( j ) ξ t − j } ) − F ξ ( 1 a ( ϑ, { x − (cid:88) j (cid:54) =0 a ( ϑ, j ) ξ t − j } ) | ] ≤ E [ C | a t,T (0) − a ( ϑ, | + C || S t,T − S ϑt | ] , where S t,T := (cid:88) j (cid:54) =0 a t,T ( j ) ξ t − j , S ϑt := (cid:88) j (cid:54) =0 a ( ϑ, j ) ξ t − j , and the last inequality follows from the two dimensional mean-value theorem. To be moreprecise, (cid:12)(cid:12)(cid:12) F ξ (cid:18) x − vu (cid:19) − F ξ (cid:18) x − v (cid:48) u (cid:48) (cid:19) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) (cid:90) f ξ (cid:18) x − v t u t (cid:19) x − v t u t dt (cid:12)(cid:12)(cid:12) | u − u (cid:48) | + (cid:12)(cid:12)(cid:12) (cid:90) f ξ (cid:18) x − v t u t (cid:19) u t dt (cid:12)(cid:12)(cid:12) | v − v (cid:48) | , with u t = u + t ( u (cid:48) − u ) and v t = v + t ( v (cid:48) − v ) . From Assumption (MA2) the integrals arebounded by constants C and C which are independent of x. Straightforward calculations,under the assumptions made, lead to E [ | S t,T − S ϑt | ] = O( | t − ϑT − | + T − ) , which completes the proof. A.4.1.2 Proof of Lemma . ϑ ∈ (0 , E ( ∞ (cid:88) j =1 a j ( ϑ ) Z j ) < , (A.5)53hich is suﬃcient for the existence and uniqueness of a strictly stationary solution ( X ϑt ) with ﬁnite ﬁrst moment(see Giraitis et al. (2000) Theorem 2 . ARCH

1) yields (cid:80) j | a j ( ϑ ) | < ∞ , which implies that ( σ ϑt ) is locally strictly stationary (Proposition 3 . . σ ϑt = (cid:112) ( σ ϑt ) and set X ϑt = σ ϑt Z t . To prove localstrict stationarity, it suﬃces to bound sup x ∈ R E ( | I { X t,T ≤ x } − I { X ϑt ≤ x } | ) . Denoting by A t the σ − algebra generated by ( Z t , Z t − , . . . ) , observe that E (cid:0)(cid:12)(cid:12) I { X t,T ≤ x } − I { X ϑt ≤ x } (cid:12)(cid:12)(cid:1) = E (cid:0) E (cid:2)(cid:12)(cid:12) I { X t,T ≤ x } − I { X ϑt ≤ x } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A t − (cid:3)(cid:1) = E (cid:0) E (cid:2)(cid:12)(cid:12) I { Z t ≤ x/σ t,T } − I { Z t ≤ x/σ ϑt } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A t − (cid:3)(cid:1) ≤ E (cid:0)(cid:12)(cid:12) F ( x/σ t,T ) − F ( x/σ ϑ ) (cid:12)(cid:12)(cid:1) = E (cid:0)(cid:12)(cid:12) (cid:90) σ ϑt σ t,T xy − f ( xy − ) y − dy (cid:12)(cid:12)(cid:1) ≤ E (cid:0) C (cid:12)(cid:12) σ t,T − σ ϑt (cid:12)(cid:12)(cid:1) , where the last inequality follows from ( ARCH

2) and the fact that min( σ t,T , σ ϑt ) > ρ. Now,as Z t is independent of ( σ t,T , σ ϑt ) , we have E ( | σ t,T − ( σ ϑt ) | ) = E [ Z t ( | σ t,T − ( σ ϑt ) | )] = E ( | X t,T − ( X ϑt ) | ) ≤≤ C (cid:16)(cid:12)(cid:12) tT − ϑ (cid:12)(cid:12) + 1 T (cid:17) where the last inequality follows from Theorem 1 in Dahlhaus and Subba Rao (2006). Not-ing again that min( σ t,T , σ ϑt ) is bounded away from zero, we have, for some appropriateconstant C , E ( | σ t,T − σ ϑt | ) ≤ C E ( | σ t,T − ( σ ϑt ) | ) , which concludes the proof. A.4.1.3 Proof of Lemma . GARCH

1) implies that sup u ∈ [0 , E ( Z t ) (cid:104) (cid:80) pj =1 a j ( u ) + (cid:80) qi =1 b j ( u ) (cid:105) <

1, whichis suﬃcent for the existence of a strictly stationary solution X ϑt (see the Remark after Corol-lary 2.2 in Bougerol and Picard (1992)). Similar calculations as in the tvARCH case yield E ( | I { X t,T ≤ x } − I { X ϑt ≤ x } | ) ≤ E ( C | σ t,T − ( σ ϑt ) ), and a bound on E ( | σ t,T − ( σ ϑt ) | ) follows as inSection 5 . . (top of page 1168) of Subba Rao (2006).54 .4.2 Proof of Theorem 5.1 The proof proceeds in several steps, which we brieﬂy outline here; details are provided inSection A.6. First, we establish that the estimator ˆ q t ,T ( τ ) can be replaced with the truequantile levels τ , that is,ˆ f t ,T ( ω n , τ , τ ) = 12 π (cid:88) | k |≤ n − K n ( k ) e − iω n k × n (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17) + o P (( B n /n ) / ) , (A.6)uniformly in ω n ∈ ˜ F n ( ε ) , where ˜ F n ( ε ) denotes the set of all Fourier frequencies in theinterval ( ε, π − ε ). Second, we prove that, uniformly again in ω n ∈ ˜ F n ( ε ),ˆ f t ,T ( ω n , τ , τ ) = ˜ f t ,T ( ω n , τ , τ ) + O P (cid:16) B n n + n T B n + n / B n T (cid:17) (A.7)where ˜ f t ,T ( ω n , τ , τ ) := 12 π n (cid:88) | k |≤ n − K n ( k ) e − iω n k (cid:88) | t − t |≤ m T − B n Y t,τ Y t + k,τ and Y t,τ := I { X t,T ≤ q ϑ ( τ ) } − F t ; T ( q ϑ ( τ )). The advantage of this representation lies in thefact that the random variables I { X t,T ≤ q ϑ ( τ ) } − F t ; T ( q ϑ ( τ )) are centered, which considerablysimpliﬁes some of the computations that follow. Next, observe that˜ f t ,T ( ω n , τ , τ ) = ˜ f t ,T ( ω, τ , τ ) + O P ( B n /n )since | ω n − ω | = O (1 /n ). Finally, we prove that (cid:112) B n /n  (cid:60) ˜ f t ,T ( ω, τ , τ ) − (cid:60) E ˜ f t ,T ( ω, τ , τ ) (cid:61) ˜ f t ,T ( ω, τ , τ ) − (cid:61) E ˜ f t ,T ( ω, τ , τ )  D −→ N (0 , Σ ( ω, τ , τ )) (A.8)55nd E ˜ f t ,T ( ω, τ , τ ) = f ϑ ( ω, τ , τ ) − C K ( r ) B − rn d ( r ) ω f ϑ ( ω, τ , τ )+ n T ∂ ∂u f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) (cid:12)(cid:12)(cid:12) u = ϑ (A.9)+ o ( B − rn + n /T ) + O (1 /n ) , which completes the proof of the theorem. A.5 Some probabilistic details

A.5.1 A Lemma on cumulants

Lemma A.5.1.

For an arbitrary stochastic process ( X t ) t ∈ Z , let α ( n ) := sup t ∈ Z sup A ∈ σ ( ..X t − ,X t ) ,B ∈ σ ( X t + n ,X t + n +1 ,... ) | P ( A ∩ B ) − P ( A ) P ( B ) | . Then, for any t , ..., t p ∈ Z and any p-tuple Borel sets A , ..., A p there exists a constant K p depending on p only such that (cid:12)(cid:12)(cid:12) cum( I { X t ∈ A } , . . . , I { X t p ∈ A p } ) (cid:12)(cid:12)(cid:12) ≤ K p α (cid:16)(cid:4) p − max i,j | t i − t j | (cid:5)(cid:17) . Proof.

Recall that, by the deﬁnition of cumulants, | cum( I { X t ∈ A } , . . . , I { X t p ∈ A p } ) | = (cid:12)(cid:12)(cid:12) (cid:88) { ν ,...,ν R } ( − R − ( R − P (cid:16) (cid:92) i ∈ ν { X t i ∈ A i } (cid:17) · · · P (cid:16) (cid:92) i ∈ ν R { X t i ∈ A i } (cid:17)(cid:12)(cid:12)(cid:12) , (A.10)where the summation runs over all partitions { ν , . . . , ν R } of the set { , . . . , p } . In thecase t = ... = t p , the Lemma is obviously true. If at least two indices are distinct, choose j with max i =1 ,...,p − ( t i +1 − t i ) = t j +1 − t j > Y t j +1 , . . . , Y t p ) be a random vector that is56ndependent of ( X t , . . . , X t j ) and possesses the same joint distribution as ( X t j +1 , . . . , X t p ).By an elementary property of the cumulants (cf. Theorem 2.3.1 (iii) in Brillinger (1975)),we have cum (cid:0) I { X t ∈ A } , . . . , I { X t j ∈ A j } , I { Y t j +1 ∈ A j +1 } , . . . , I { Y t p ∈ A p } (cid:1) = 0 . Therefore, we can write, for the cumulant of interest, (cid:12)(cid:12)(cid:12) cum( I { X t ∈ A } , . . . , I { X t p ∈ A p } ) − cum( I { X t ∈ A } , . . . , I { X t j ∈ A j } , I { Y t j +1 ∈ A j +1 } , . . . , I { Y t p ∈ A p } ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:88) { ν ,...,ν R } ( − R − ( R − P ν · · · P ν R − Q ν · · · Q ν R ] (cid:12)(cid:12)(cid:12) , where the sum again runs over all partitions { ν , . . . , ν R } of { , . . . , p } , P ν r := P (cid:16) (cid:92) i ∈ ν r { X t i ∈ A i } (cid:17) and Q ν r := P (cid:16) (cid:92) i ∈ ν r i ≤ j { X t i ∈ A i } (cid:17) P (cid:16) (cid:92) i ∈ ν r i>j { X t i ∈ A i } (cid:17) ,r = 1 , . . . , R , with P ( (cid:84) i ∈∅ { X t i ∈ A i } ) := 1 by convention. By the deﬁnition of α ( n ), itfollows that | P ν r − Q ν r | ≤ α ( t j +1 − t j ) for any partition ν , ..., ν R and any r = 1 , ..., R . Thus,for every partition ν , ..., ν R , | P ν · · · P ν R − Q ν · · · Q ν R | ≤ R (cid:88) r =1 | P ν r − Q ν r | ≤ Rα ( t j +1 − t j ) . All together, this yields | cum( I { X t ∈ A } , . . . , I { X t p ∈ A p } ) | ≤ α ( t j +1 − t j ) (cid:88) { ν ,...,ν R } R ! . Noting that p ( t j +1 − t j ) ≥ max i ,i | t i − t i | and observing that α is a monotone function,57e obtain | cum( I { X t ∈ A } , . . . , I { X t p ∈ A p } ) | ≤ K p α (max | t i − t j | ) . Lemma A.5.2.

Let F and G denote functions on the real line, with | G ( x ) − G ( y ) | > c | x − y | for x, y ∈ [ a, b ] where c is some positive constant. For all p, q ∈ ( a, b ) , with F ( p ) = G ( q ) andany (cid:15) > , || F ( · ) − G ( · ) || ∞ ≤ (cid:15) implies | p − q | ≤ (cid:15)/c. Proof.

The claim follows from the fact that c | p − q | < | G ( p ) − G ( q ) | = | G ( p ) − F ( p ) | ≤ (cid:15). A.5.2 A blocking technique for nonstationary β -mixing processes In her paper, Yu (1994) constructed an independent block (IB) technique to transfer theclassical tools from the i.i.d. case to the case of β -mixing stationary time series. We areusing the same technique here to derive results for sums of β -mixing local stationary timeseries, which will be used on multiple occasions. For this purpose, let X t,n be a triangulararray of β -mixing processes with mixing coeﬃcient β n . For each ﬁxed n we will divide theprocess X t,n into 2 µ n alternating blocks with lengths p n and q n , respectively, and a remainderblock of length n − µ n ( p n + q n ) . More precisely, we divide the index set into (2 µ n + 1) partsΓ j = { t : t min + B n + ( j − p n + q n ) + 1 ≤ t ≤ t min + B n + ( j − p n + q n ) + p n } ∆ j = { t : t min + B n + ( j − p n + q n ) + p n + 1 ≤ t ≤ t min + B n + j ( p n + q n ) } R = { t : t min + B n + µ n ( p n + q n ) + 1 ≤ t ≤ t min + n − B n } , and introduce the notation X (Γ j ) = { X i,n , i ∈ Γ j } , X (∆ j ) = { X i,n , i ∈ ∆ j } and X ( R ) = { X i,n , i ∈ R } , n is omitted for the sake of brevity. We now have a sequence ofalternating X (Γ j ) and X (∆ j ) blocks, and a remainder block X ( R ) : X = X (Γ ) , X (∆ ) , X (Γ ) , . . . , X (Γ µ n ) , X (∆ µ n ) , X ( R ) . To exploit the concept of coupling, we consider a one-dependent block sequence Y = Y (Γ ) , Y (∆ ) , Y (Γ ) , . . . , Y (Γ µ n ) , Y (∆ µ n ) , where Y (Γ j ) = { ξ i : i ∈ Γ j } and Y (∆ j ) = { Y i : i ∈ ∆ j } such that the sequence is independentof X and each block of Y has the same distribution as the corresponding block in X , that is, Y (Γ i ) d = X (Γ i ) and Y (∆ i ) d = X (∆ i ) , where d = stands for equality in distribution.The existence of such a sequence and the measurability issues that arise are addressedin Yu (1994). The Γ- and ∆-block subsequences are denoted by X Γ , Y Γ , X ∆ and Y ∆ , respec-tively: for instance, X Γ := X (Γ ) , X (Γ ) , . . . , X (Γ µ n ) . We obtain X Γ by leaving out every other block in the original sequence, which is β -mixing,so that the dependence between the blocks in X Γ becomes weaker as the size p n of theΓ-blocks increases. The following lemma from Yu (1994) establishes an upper bound forthe diﬀerence between the Γ-block sequences from the original process and the independentblock sequence. Lemma A.5.3.

For any measurable function h on R µ n q n with (cid:107) h (cid:107) ∞ ≤ M, we have, for theblocking structure just described, (cid:12)(cid:12)(cid:12) E Q [ h ( X (∆))] − E ˜ Q [ h ( Y (∆))] (cid:12)(cid:12)(cid:12) ≤ M ( µ n − β p n and (cid:12)(cid:12) E Q [ h ( X (Γ))] − E ˜ Q [ h ( Y (Γ))] (cid:12)(cid:12)(cid:12) ≤ M ( µ n − β q n . Proof.

We only prove the ﬁrst claim, which follows as an application of Corollary 2.7 in Yu(1994) with Q being the probability distribution of the ∆ j block sequence. However, notethat the β -mixing rate of Q here is less than β p n , due to the alternating block length.Next, we consider a special case of the same blocking technique with a n := q n = p n , now applied to a sum of β -mixing random variables, namely (cid:80) nt =1 f ( X t,n ) , and link itsprobabilistic behavior to that of the sum of the independent blocks (cid:80) µ n j =1 (cid:80) i ∈ Γ j f ( Y i,n ) . Toavoid measurability issues the function f is assumed to belong to a permissable class F n of functions (for a deﬁnition see the appendix in Yu (1994)). Furthermore, for the sakeof simplicity, assume that E ( f ( X i,n )) = 0 for all f ∈ F n . The following Lemma is a slightadjustment of Lemma 4 . Lemma A.5.4.

Let F n be a sequence of permissible function classes, which are bounded bya constant M n . If a sequence ( r n ) n ∈ N is such that, for n large enough, r n µ n ≥ nM n , wehave P (cid:16) sup f ∈ F n (cid:12)(cid:12) n (cid:88) t =1 f ( X t,n ) (cid:12)(cid:12) > r n (cid:17) ≤ P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( Y i,n ) (cid:12)(cid:12) > r n (cid:17) (A.11)+ P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ ∆ j f ( Y i,n ) (cid:12)(cid:12) > r n (cid:17) + 2 µ n β a n . Proof.

The probability in the left-hand side of ( A.

11) splits into three parts: namely, P (cid:16) sup f ∈ F n (cid:12)(cid:12) n (cid:88) t =1 f ( X t,n ) (cid:12)(cid:12) > r n (cid:17) ≤ P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( X i,n ) (cid:12)(cid:12) > r n (cid:17) + P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ ∆ j f ( X i,n ) (cid:12)(cid:12) > r n (cid:17) + P (cid:16) sup f ∈ F n (cid:12)(cid:12) (cid:88) i ∈ R f ( X i,n ) (cid:12)(cid:12) > r n (cid:17) . M n (2 a n ) ≤ M n n/µ n . As 2 r n µ n ≥ nM n , that probability is zero.Turning to the ﬁrst part, Lemma A. . h the indicator function of the event (cid:110) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( X i,n ) (cid:12)(cid:12) > r n (cid:111) yields P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( X i,n ) (cid:12)(cid:12) > r n (cid:17) ≤ P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( Y i,n ) (cid:12)(cid:12) > r n (cid:17) + µ n β a n , the second term can be treated by the same arguments. The claim follows.The upper bound in Lemma A.5.4 is based on i.i.d. blocks, which allows us to use classicaltechniques. In particular, we will apply the Benett inequality to further bound the sum of β -mixing random variables. For this purpose assume that the number of functions m f ( n )contained in F n is ﬁnite, so that P (cid:16) sup f ∈ F n (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( Y i,n ) (cid:12)(cid:12)(cid:17) ≤ m f ( n ) sup f ∈ F n P (cid:16)(cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( Y i,n ) (cid:12)(cid:12) > r n (cid:17) . Furthermore, let us assume that the variance Var( (cid:80) µ n j =1 (cid:80) i ∈ Γ j f ( Y i,n )) of the blocks is boundedby some ﬁnite V n , so that the Benett inequality yields P (cid:16)(cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) i ∈ Γ j f ( Y i,n ) (cid:12)(cid:12) > r n (cid:17) ≤ exp (cid:16) − µ n V n a n M n h (cid:16) r n a n M n µ n V n (cid:17)(cid:17) , (A.12)where h ( x ) = (1 + x ) log (1 + x ) . Calculations similar to those in the proof of Lemma 6.7 inDette et al. (2015) we can bound the probability byexp (cid:16) − log 22 (cid:16) r n µ n V n ∧ r n a n M n (cid:17)(cid:17) .

61e just have proven the following Lemma

Lemma A.5.5.

Let X t,n be a triangular array of β -mixing random variables and F n asequence of ﬁnite function classes with cardinality F n that fulﬁlls ( i ) F n ≤ m f ( n ) , ( ii ) sup f ∈ F n | f ( X t,n ) | ≤ M n and ( iii ) E ( f ( X )) = 0 Assume a blocking structure with block length a n := p n = q n which divides the index setinto µ n + 1 parts, where n/ − a n ≤ µ n a n ≤ n/ , a n → ∞ and µ n → ∞ , satisfying(a) µ n β a n n →∞ −−−→ , (b) r n µ n ≥ nM n and(c) Var( (cid:80) i ∈ Γ j f ( X i,n )) ∨ Var( (cid:80) i ∈ ∆ j f ( X i,n )) ≤ V n for all ≤ j ≤ µ n . If these conditions are met ,we obtain P (cid:16) sup f ∈ F n (cid:12)(cid:12) n (cid:88) t =1 f ( X t,n ) (cid:12)(cid:12) > r n (cid:17) ≤ m f ( n ) exp (cid:16) − log 22 (cid:16) r n µ n V n ∧ r n a n M n (cid:17)(cid:17) + o (1) . A.5.3 Auxiliary technical results

Lemma A.5.6.

Assume that M T → ∞ , T /M T → and t /T → ϑ ∈ (0 , . Under As-sumptions (A1)-(A4), for any bounded S ⊂ R , (cid:16)(cid:112) M T (cid:16) M T (cid:88) | t − t |≤ M T ( I { X t,T ≤ x } − F t,T ( x )) (cid:17)(cid:17) x ∈ R D −→ B in (cid:96) ∞ ( S ) where B denotes a centered Gaussian process with covariances E [ B ( s ) B ( t )] = (cid:88) k ∈ Z (cid:0) G ϑk ( x, y ) − G ϑ ( x ) G ϑ ( y ) (cid:1) . roof. In order to prove weak convergence, we need to establish asymptotic equicontinuityand convergence of ﬁnite-dimensional distributions (see Theorem 2.1 in Kosorok (2007)).Convergence of ﬁnite-dimensional distributions follows as an application of Lemma A.5.3the arguments are quite standard and omitted for the sake of brevity. To prove asymptoticequicontinuity, we apply Lemma 7.1 from Kley et al. (2016). More precisely, consider theprocess H n ( x ) := √ M T (cid:80) | t − t |≤ M T ( I { X t,T ≤ x } − F t,T ( x )) , where n denotes the cardinalityof the set { t ∈ { , . . . , T } : | t − t | ≤ M T } . Then, H n ( x ) − H n ( y ) = (cid:88) | t − t |≤ M T W t,T ( x, y )where W t,T ( x, y ) := 1 √ M T (cid:16) I { X t,T ≤ x } − F t,T ( x ) − ( I { X t,T ≤ y } − F t,T ( y ))) (cid:17) . Since E W t,T ( x, y ) = 0 for all x, y , by the deﬁnition of cumulants, we have E | H n ( x ) − H n ( y ) | = 3 (cid:16) cum (cid:16) (cid:88) | t − t |≤ M T W t,T ( x, y ) (cid:17)(cid:17) + cum (cid:16) (cid:88) | t − t |≤ M T W t,T ( x, y ) (cid:17) where cum k ( y ) := cum( y, ..., y ). Assumption (A3)(iii) implies thatcum (cid:16) (cid:88) | t − t |≤ M T W t,T ( x, y ) (cid:17) = O (1 /M T )while, under Assumption (A3)(i), there exist constants C and ˜ C such that (cid:12)(cid:12)(cid:12) cum (cid:16) (cid:88) | t − t |≤ M T W t,T ( x, y ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ | x − y | + C (cid:88) s ≥ min( | x − y | , s | γ ) ≤ ˜ C | x − y | − γ − where the last equality follows by (A.22). Thus, there exists a constant C > | x − y | ≥ /M / T , we have E | H n ( x ) − H n ( y ) | ≤ C | x − y | − γ − . Now, ﬁx δ > x ) = x , d ( x, y ) := | x − y | ( γ − / (2 γ ) , ¯ η := (2 /n ) ( γ − / (2 γ ) , G x := H n ( x ) , and T := S. In particular, the packing number of the bounded set S with respect to the metric d can bebounded by D ( ε, d ) ≤ Cε − γ/ ( γ − for some constant C independent of ε . This yieldssup x,y ∈ S,d ( x,y ) ≤ δ | H n ( x ) − H n ( y ) | ≤ S + 2 sup x,y ∈ S,d ( x,y ) ≤ ¯ η | H n ( x ) − H n ( y ) | (A.13)where the quantity S satisﬁes (cid:107) E S (cid:107) / ≤ K (cid:104) (cid:90) η ¯ η/ ε − γ/ γ − dε + ( δ + 2¯ η ) η − γ/ ( γ − (cid:105) . (A.14)Note that γ > γ/ γ − < , so that ε − γ/ γ − is integrable on [0 , η := δ ( γ − / (2 γ ) implies δη − γ/ ( γ − = δ / , hencelim δ ↓ lim sup n →∞ (cid:107) E S (cid:107) / = 0 . Finally, note that similar arguments as in the proof of (A.18) entailsup x,y ∈ S,d ( x,y ) ≤ ¯ η | H n ( x ) − H n ( y ) | = o P (1) . (A.15)Jointly, (A.13)-(A.15) imply that, for any α > , lim δ ↓ lim sup n →∞ P (cid:16) sup x,y ∈ S,d ( x,y ) ≤ δ | H n ( x ) − H n ( y ) | ≥ α (cid:17) = 0 . d makes the index set S totally bounded, condition (ii) in Theorem 2.1 inKosorok (2007) follows. This, together with the weak convergence of the ﬁnite-dimensionaldistributions, completes the proof. Lemma A.5.7.

Let (cid:96) n ∈ Z be a sequence such that ω (cid:96) n := 2 π(cid:96) n /n → ω (cid:54)≡ π . Let K be a function satisfying assumption (K) and deﬁne K n ( k ) := K ( k/B n ) , for k ∈ Z , where B n = o ( n ) . Denote by ˜ F n ( ε ) the set of Fourier frequencies which are contained in ( ε, π − ε ) .Assume that condition (A4)(iv) holds. Then sup ω ∈ ˜ F n ( ε ) sup t ∈N t ,T sup τ (cid:12)(cid:12)(cid:12) (cid:88) | k |≤ B n K n ( k )e − i ωk (cid:0) F t + k ; T ( q ϑ ( τ )) − τ (cid:1)(cid:12)(cid:12)(cid:12) = O (cid:16) nT B d − n + B n /T (cid:17) and sup ω ∈ ˜ F n ( ε ) (cid:12)(cid:12)(cid:12) (cid:88) | k |≤ B n K n ( k )e − i ωk (cid:12)(cid:12)(cid:12) = O (cid:16) B d − n (cid:17) . Proof.

We only establish the ﬁrst statement since the second one can be proved by similararguments. Let h t,T ( u ) := K (cid:0) u nB n (cid:1) [ G tT + u nT ( q ϑ ( τ )) − τ ], u ∈ [ − / , /

2] for T large enoughthat | un/B n | ≤ kT + u nT ∈ [0 , − B n /n, B n /n ] and is d times continuously diﬀerentiable as a functionon ( − / , / (cid:88) | k |≤ B n K (cid:16) kB n (cid:17) e − i ωk [ F t + k ; T ( q ϑ ( τ )) − τ ]= n/ (cid:88) k = − n/ K (cid:16) kn nB n (cid:17) [ G tT + kn nT ( q ϑ ( τ )) − τ ]e − i ωk + O ( B n /T )= n/ (cid:88) k = − n/ h t,T ( k/n )e − i ωk + O ( B n /T ) .

65y Leibniz’s rule, we have h ( d ) t,T ( u ) = d − (cid:88) j =0 (cid:18) dj (cid:19)(cid:16) nB n (cid:17) j (cid:16) nT (cid:17) d − j K ( j ) (cid:16) u nB n (cid:17) ∂ d − j ∂v d − j G tT + v ( q ϑ ( τ )) (cid:12)(cid:12)(cid:12) v = u nT , + (cid:16) nB n (cid:17) d K ( d ) (cid:16) u nB n (cid:17)(cid:16) G tT + u nT ( q ϑ ( τ )) − τ (cid:17) so that, under the assumptions made, for some constant C d depending only on K , d , andthe mapping u (cid:55)→ G u ( q τ ) , sup t,T,u | h ( d ) t,T ( u ) | ≤ C d ( n/B n ) d nT . Note that, under the assumptions of the lemma, the function u (cid:55)→ h t,T ( u ) is twice con-tinuously diﬀerentiable on ( − / , / h t,T ( u ) = (cid:88) j ∈ Z c j,t,T e i2 πju , where c j,t,T := (cid:90) / − / h t,T ( u )e − i2 πju d u. Now consider a Fourier frequency ω (cid:96) = 2 π(cid:96)/n ∈ ˜ F n ( ε ). By the usual argument (see Briggsand Henson (1995), page 182), we have the discrete Poisson summation formula n/ (cid:88) k = − n/ h t,T ( k/n )e − i ω (cid:96) k = (cid:88) j ∈ Z c j,t,T n/ (cid:88) k = − n/ e i2 πk ( j − (cid:96) ) /n = n (cid:16) c (cid:96),t,T + ∞ (cid:88) k =1 ( c (cid:96) + kn,t,T + c (cid:96) − kn,t,T ) (cid:17) . For the leading term, note that h ( r ) t,T ( u ) = 0 for | u | > B n /n , so that, integrating by partsyields c (cid:96),t,T = ( − d +1 π(cid:96) ) d (cid:90) B n /n − B n /n h ( d ) t,T ( u )e − i2 π(cid:96)u d u. (A.16)It follows that | c (cid:96),t,T | ≤ C p (2 π(cid:96) ) − d nT ( n/B n ) d − (cid:46) T B d − n , as (cid:96) (cid:16) n . Furthermore, by As-66umption (A4)(iv) (recall that d ≥ (cid:96)/n → c ∈ (0 ,

1) mod 1), (cid:12)(cid:12)(cid:12) ∞ (cid:88) k =1 ( c (cid:96) + kn,t,T + c (cid:96) − kn,t,T ) (cid:12)(cid:12)(cid:12) (cid:46) nT (cid:16) nB n (cid:17) d − ∞ (cid:88) k =1 (cid:16) (cid:96) + kn ) d + 1( (cid:96) − kn ) d (cid:17) = 1 T B d − n ∞ (cid:88) k =1 (cid:16) (cid:96)/n + k ) d + 1( (cid:96)/n − k ) d (cid:17) (cid:46) T B d − n . Note that all the bounds above hold uniformly in (cid:96) (cid:16) n . This completes the proof of thelemma. A.6 Details for the proof of Theorem 5.1

A.6.1 Proof of (A.6)

Deﬁne ˆ F t ,t + k ; T ( x, y ) := 1 n (cid:88) t ∈ T ( k ) I { X t,T ≤ x,X t + k,T ≤ y } , and let r n, ( k ) := ˆ F t ,t + k ; T (ˆ q t ,T ( τ ) , ˆ q t ,T ( τ )) − ˆ F t ,t + k ; T ( q ϑ ( τ ) , q ϑ ( τ )) − n (cid:88) t ∈ T ( k ) (cid:16) F t,t + k ; T (ˆ q t ,T ( τ ) , ˆ q t ,T ( τ )) − F t,t + k ; T ( q ϑ ( τ ) , q ϑ ( τ )) (cid:17) ,r n, ( k ) := 1 n (cid:88) t ∈ T ( k ) (cid:104) F t,t + k ; T (ˆ q t ,T ( τ ) , ˆ q t ,T ( τ )) − F t ; T (ˆ q t ,T ( τ )) F t + k ; T (ˆ q t ,T ( τ )) − (cid:16) F t,t + k ; T ( q ϑ ( τ ) , q ϑ ( τ )) − F t ; T ( q ϑ ( τ )) F t + k ; T ( q ϑ ( τ )) (cid:17)(cid:105) ,r n, ( k ) := 1 n (cid:88) t ∈ T ( k ) (cid:16) F t ; T (ˆ q t ,T ( τ )) F t + k ; T (ˆ q t ,T ( τ )) − F t ; T ( q ϑ ( τ )) F t + k ; T ( q ϑ ( τ )) (cid:17) ,r n, ( k ) := τ n (cid:88) t ∈ T ( k ) (cid:16) I { X t + k,T ≤ ˆ q t ,T ( τ ) } − τ (cid:17) + τ n (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ ˆ q t ,T ( τ ) } − τ (cid:17) − τ n (cid:88) t ∈ T ( k ) (cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17) − τ n (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17) . K n , π ˆ f t ,T ( ω, τ , τ ) − (cid:88) | k |≤ n − K n ( k ) e − iωk n (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17) = (cid:88) | k |≤ B n K n ( k ) e − iωk (cid:16) r n, ( k ) + r n, ( k ) + r n, ( k ) + r n, ( k ) (cid:17) =: R n, + R n, + R n, + R n, , say.To prove (A.6) it is suﬃcient to establish the following statements:max( | q ϑ ( τ ) − ˆ q t ,T ( τ ) | , | q ϑ ( τ ) − ˆ q t ,T ( τ ) | ) = O P ( T − / ) , (A.17)sup k sup x ∈ X sup (cid:107) y (cid:107)≤ ε n (cid:12)(cid:12)(cid:12) ˆ F t ,t + k ; T ( x ) − ˆ F t ,t + k ; T ( x + y ) − n (cid:88) t ∈ T ( k ) [ F t,t + k ; T ( x ) − F t,t + k ; T ( x + y )] (cid:12)(cid:12)(cid:12) = O P ( ρ n ( ε n )) , (A.18)for any ε n = o (1) and any bounded set X ⊂ R with (cid:107) v (cid:107) denoting the maximum norm ofthe vector v , andsup x ∈ Z sup | y |≤ ε n (cid:12)(cid:12)(cid:12) n (cid:88) | t − t |≤ m T (cid:16) I { X t,T ≤ x } − I { X t,T ≤ x + y } − F t ; T ( x ) + F t ; T ( x + y ) (cid:17)(cid:12)(cid:12)(cid:12) = O P ( ρ n ( ε n )) (A.19)for any ε n = o (1) and any bounded set Z ⊂ R where ρ n is deﬁned in Assumption (A2). Weﬁrst analyze the asymptotic behavior of the four remainder terms R n, , R n, , R n, , R n, , thenturn to the proofs for (A.17) - (A.19). Discussion of remainder term R n, . From (A.17) and (A.18), we obtain sup k | r n, ( k ) | = O P ( ρ n ( T − / )) hence under (A3) | R n, | = o P ( B / n n − / ) . (A.20)68 iscussion of remainder term R n, . Under (A3)(i) and (A4)(i), | r n, ( k ) | = O (min( | k | − γ , T − / )) , and thus | R n, | ≤ (cid:88) | k |≤ B n O (min( | k | − γ , T − / )) = O ( T − (1 − γ − )2 / ) = o ( B / n n − / ) . (A.21)To see the this, note that, for ε → , (cid:88) k ≥ min( k − γ , ε ) ≤ (cid:88) ≤ k ≤ ε − /γ ε + (cid:88) k ≥ ε − /γ k − γ = O ( ε − /γ ) + O ( ε ( − /γ )(1 − γ ) ) = O ( ε − γ − ) . (A.22) Discussion of remainder term R n, . Start by observing that1 n (cid:88) t ∈ T ( k ) F t ; T ( x ) F t + k ; T ( y ) = 1 n (cid:88) | t − t |≤ m T F t ; T ( x ) F t + k ; T ( y ) + O ( k/n )= 1 n (cid:88) | t − t |≤ m T G t/T ( x ) G ( t + k ) /T ( y ) + O ( k/n ) + O (1 /T )= 1 n (cid:88) | t − t |≤ m T G t/T ( x ) G t/T ( y ) + O ( k/T ) + O ( k/n ) + O (1 /T )= T m T (cid:90) m T /T − m T /T G ϑ + u ( x ) G ϑ + u ( y ) du + O ( k/n ) , where we have used a ﬁrst-order Taylor expansion of the function u (cid:55)→ G u ( x ). This yields r n, ( k ) = T m T (cid:90) m T /T − m T /T G ϑ + u (ˆ q t ,T ( τ )) G ϑ + u (ˆ q t ,T ( τ )) − G ϑ + u ( q ϑ ( τ )) G ϑ + u ( q ϑ ( τ )) du + O ( B n /n ) , uniformly in | k | ≤ B n . Observe that, by Lemma A.5.7 under condition (K), we havesup ω ∈ ˜ F n ( ε ) (cid:12)(cid:12)(cid:12) (cid:88) | k |≤ n − K n ( k ) e − ikω (cid:12)(cid:12)(cid:12) = O (1) . R n = (cid:16) (cid:88) | k |≤ B n K n ( k ) e − iωk (cid:17) T m T (cid:90) m T /T − m T /T G ϑ + u (ˆ q t ,T ( τ )) G ϑ + u (ˆ q t ,T ( τ )) − G ϑ + u ( q ϑ ( τ )) G ϑ + u ( q ϑ ( τ )) du + O ( B n /n )= O (cid:16) max (cid:0) | q ϑ ( τ ) − ˆ q t ,T ( τ ) | , | q ϑ ( τ ) − ˆ q t ,T ( τ ) | (cid:1)(cid:17) + O ( B n /n )uniformly in ω ∈ ˜ F n ( ε ) , almost surely. Recalling (A.17) and Assumption (A2) we thus obtain | R n, | = o P ( B / n n − / ) . (A.23) Discussion of remainder term R n, . Observe that, uniformly in | k | ≤ B n and y ∈ R we have1 n (cid:88) t ∈ T ( k ) I { X t,T ≤ y } = 1 n (cid:88) | t − t |≤ m T I { X t,T ≤ y } + O P ( B n /n ) , n (cid:88) t ∈ T ( k ) I { X t + k,T ≤ y } = 1 n (cid:88) | t − t |≤ m T I { X t,T ≤ y } + O P ( B n /n ) . Thus, uniformly in | k | ≤ B n , r n, ( k ) = D n + O P ( B n /n ), where D n := (cid:88) | t − t |≤ m T τ n (cid:16) I { X t,T ≤ ˆ q t ,T ( τ ) } − I { X t,T ≤ q ϑ ( τ ) } (cid:17) + τ n (cid:16) I { X t,T ≤ ˆ q t ,T ( τ ) } − I { X t,T ≤ q ϑ ( τ ) } (cid:17) does not depend on k . In particular, by Lemma A.5.7 this implies | R n, | ≤ O P ( B n /n ) + | D n | sup ω ∈ ˜ F n ( ε ) (cid:12)(cid:12)(cid:12) (cid:88) | k |≤ n − K n ( k ) e − ikω (cid:12)(cid:12)(cid:12) = O P ( B n /n ) + | D n | O ( B − n ) . To conclude with R n, , note that combining (A.17), (A.19), and Assumption (A4)(i) we ob-tain | D n | = O P ( ρ n ( T − / ))+ O P ( T − / ). Together with (A2), this entails R n, = o P ( B / n /n / )which, combined with (A.20)-(A.23), yields (A.6). It remains to establish (A.17) - (A.19).70 roof of (A.17) Letting M T = T / in Lemma A.5.6, we obtain the weak convergenceof √ T / ( ˜ F t ; T ( x ) − ¯ F ( x )) , where¯ F ( x ) := 12 T / (cid:88) | t − t |≤ T / F t ; T ( x ) = 12 T / (cid:88) | t − t |≤ T / G t/T ( x ) + O (1 /T ) . to a centered Gaussian process with almost surely continuous sample paths. Next, observethat, uniformly in x, ¯ F ( x ) = T T / (cid:90) T / /T − T / /T G ϑ + u ( x ) du + O (1 /T ) = G ϑ ( x ) + O (( T / /T ) ) + O (1 /T )where we have used a second-order Taylor expansion of the function u (cid:55)→ G ϑ + u ( x ). Theclaim (statement (A.17)) follows by compact diﬀerentiability of the quantile mapping, seeLemma 12.8 in Kosorok (2007). Proof of (A.18) and (A.19)

Statement (A.19) can be established by similar argumentsas (A.18), and its proof is omitted for the sake of brevity. Let x = ( x , x ), y = ( y , y ) , anddeﬁne nW t,k ( x, y ) := I { X t,T ≤ x ,X t + k,T ≤ x } − I { X t,T ≤ x + y ,X t + k,T ≤ x + y } − P ( X t,T ≤ x , X t + k,T ≤ x ) + P ( X t,T ≤ x + y , X t + k,T ≤ x + y ) . With this notation, we have (cid:88) t ∈ T ( k ) W t,k ( x, y ) = ˆ F t ,t + k ; T ( x + y , x + y ) − ˆ F t ,t + k ; T ( x , x ) − n (cid:88) t ∈ T ( k ) F t,t + k ; T ( x + y , x + y ) − F t,t + k ; T ( x , x ) . Cover the bounded set { ( x, y ) : x ∈ X, (cid:107) y (cid:107) ≤ ε n } with O ( n ) spheres of radius 1 / n andcenters ( v, w ) such that (cid:107) w (cid:107) ≤ ε n , and denote the set of resulting centers by Z . Observe71hat there exists a constant C independent of k such thatsup (cid:107) ( v,w ) − ( x,y ) (cid:107)≤ /n | W t,k ( v, w ) − W t,k ( x, y ) |≤ n − ( I {| X t,T − v |≤ /n } + I {| X t + k,T − v |≤ /n } + I {| X t,T − v − w |≤ /n } + I {| X t + k,T − v − w |≤ /n } + C ):= V t,k ( v, w ) . Therefore,sup x ∈ X sup (cid:107) y (cid:107) <ε n (cid:12)(cid:12) (cid:88) t ∈ T ( k ) W t,k ( x, y ) (cid:12)(cid:12) ≤ max ( v,w ) ∈ Z (cid:12)(cid:12) (cid:88) t ∈ T ( k ) W t,k ( v, w ) (cid:12)(cid:12) + max ( v,w ) ∈ Z (cid:12)(cid:12) (cid:88) t ∈ T ( k ) V t,k ( v, w ) (cid:12)(cid:12) . We now use blocking to show that both terms in the right-hand side are of order O P ( ρ n ( ε n )) , uniformly in k . Since the random variables X t,T are β -mixing, so are the random vari-ables W t,k and V t,k , and the β -mixing coeﬃcients β [ W ] j of W t,k are bounded by β [ X ]0 ∨ ( j − B n ) . Thesame holds for the β -mixing coeﬃcients of V t,k . Furthermore, with ˚ V t,k := V t,k − E ( V t,k ) , itfollows that { W t,k ( v, w ) | ( v, w ) ∈ Z } = { ˚ V t,k ( v, w ) | ( v, w ) ∈ Z } = O ( n ) , max ( v,w ) ∈ Z | W t,k ( v, w ) | ≤ n − , max ( v,w ) ∈ Z | V t,k ( v, w ) | = O ( n − ) , and E ( W t,k ( v, w )) = E (˚ V t,k ( v, w )) = 0 , so that the classes { W t,k ( v, w ) | ( v, w ) ∈ Z } and { ˚ V t,k ( v, w ) | ( v, w ) ∈ Z } satisfy conditions( i ) − ( iii ) in Lemma A.5.5 with m f ( n ) = O ( n ) and M n = 1 /n . Set a n = (cid:100) ( n δ +1 ∨ k n ) log( n ) (cid:101) , µ n = (cid:98) n a n (cid:99) and r n = ρ n ( ε n ) , so that conditions (a) and (b) of that lemma are satisﬁed as well, for n large enough, by the72andom variables ( W t,k ) t ∈ T ( k ) and ( V t,k ) t ∈ T ( k ) , for any k . A Taylor expansion yieldssup t,k, ( v,w ) ∈ Z | E W t,k ( v, w ) W s,k ( v, w ) | = O ( ε n )for any s, t, and sup t,k, ( v,w ) ∈ Z | E W t,k ( v, w ) W s,k ( v, w ) | = O ( ε n )for any s, t such that t, t + k, s and s + k are four distinct indices. Note that, for agiven k , there exist O ( a n ) pairs ( s, t ) with t ≤ s, t ≤ t such that at least two of the fourindices ( t, t + k, s, s + k ) coincide. Thus, for suﬃciently large n and all t − t = a n , sup k ≤ B n sup ( v,w ) ∈ Z Var (cid:16) t (cid:88) t = t W t,k ( x, y ) (cid:17) ≤ c (cid:16) a n n (cid:0) ε n + a n ε n (cid:1)(cid:17) . (A.24)Applying Lemma A.5.5 to the triangular array { W t,k ( v, w ) } yields P (cid:16) sup k ≤ B n sup ( v,w ) ∈ Z (cid:12)(cid:12) (cid:88) t ∈ T ( k ) W t,k ( v, w ) (cid:12)(cid:12) > Dρ n ( ε n ) (cid:17) ≤ O ( n ) exp (cid:16) − log(2)2 (cid:16) D ρ n ( (cid:15) n ) µ n V n ∧ Dρ n ( ε n )2 a n n − (cid:17)(cid:17) , where V n := c ( a n n ( ε n + a n ε n )) and D beeing an arbitrary constant. Now, the deﬁnitionof ρ n ( ε n ) implies that D can be chosen in such a way that the right-hand side of the aboveinequality tends to zero for n → ∞ , i.e., for D suﬃciently large, P (cid:16) sup k ≤ B n sup ( v,w ) ∈ Z | (cid:88) t ∈ T ( k ) W t,k ( v, w ) | > Dρ n ( ε n ) (cid:17) = o (1) . The same analysis as before yieldssup k ≤ B n sup ( v,w ) ∈Z Var( t (cid:88) t = t ˚ V t,k ( v, w )) = O ( a n n + a n n ) = O ( a n n ) , t − t = a n ; yet another application of Lemma A.5.5 entails, for a suitable constant D, P (cid:16) sup k ≤ B n sup ( v,w ) ∈ Z | (cid:88) t ∈ T ( k ) V t,k ( v, w ) | > Dρ n ( (cid:15) n ) (cid:17) = o (1) . This completes the proof of (A.6).

A.6.2 Proof of (A.7)

First, note that (cid:88) | k |≤ n − K n ( k ) e − iωk n (cid:104) (cid:88) t ∈ T ( k ) (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17) − (cid:88) | t − t |≤ m T − B n (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:105) = O P ( B n /n ) = o P ( (cid:112) B n /n ) . By simple algebra, we obtain1 n (cid:88) | t − t |≤ m T − B n (cid:104)(cid:16) I { X t,T ≤ q ϑ ( τ ) } − F t ; T ( q ϑ ( τ )) (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − F t + k ; T ( q ϑ ( τ )) (cid:17) − (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:105) = 1 n (cid:88) | t − t |≤ m T − B n (cid:104)(cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) τ − F t + k ; T ( q ϑ ( τ )) (cid:17) + (cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − F t + k ; T ( q ϑ ( τ )) (cid:17)(cid:16) τ − F t ; T ( q ϑ ( τ )) (cid:17)(cid:105) =: a ,n + a ,n . Let A i,n := (cid:80) | k |≤ n − K n ( k ) e − iωk a i,n : the proof consists in showing that E | A i,n | = o ( B n /n ), i = 1 ,

2. We have E | A ,n | = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) | k |≤ n − K n ( k ) e − iωk n (cid:88) | t − t |≤ m T − B n (cid:16) I { X t,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) τ − F t + k ; T ( q ϑ ( τ )) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)

74 1 n (cid:88) | t − t |≤ m T − B n (cid:88) | t − t |≤ m T − B n E (cid:104)(cid:16) I { X t ,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:16) I { X t ,T ≤ q ϑ ( τ ) } − τ (cid:17)(cid:105) × (cid:88) | k |≤ n − K n ( k ) e − iωk (cid:16) τ − F t + k ; T ( q ϑ ( τ )) (cid:17) (A.25) × (cid:88) | k |≤ n − K n ( k ) e iωk (cid:16) τ − F t + k ; T ( q ϑ ( τ )) (cid:17) ;in view of Lemma A.5.7 and the fact that E [( I { X t ,T ≤ q ϑ ( τ ) } − τ )( I { X t ,T ≤ q ϑ ( τ ) } − τ )]= F t ,t ; T ( q ϑ ( τ ) , q ϑ ( τ )) − τ F t ; T ( q ϑ ( τ )) − τ F t ; T ( q ϑ ( τ )) + τ = cum( I { X t ,T ≤ q ϑ ( τ ) } , I { X t ,T ≤ q ϑ ( τ ) } ) + F t ; T ( q ϑ ( τ )) F t ; T ( q ϑ ( τ )) − τ F t ; T ( q ϑ ( τ )) − τ F t ; T ( q ϑ ( τ )) + τ = cum( I { X t ,T ≤ q ϑ ( τ ) } , I { X t ,T ≤ q ϑ ( τ ) } ) + O ( n /T ) , the right-hand side of (A.25) is bounded by1 n (cid:88) | t − t |≤ m T − B n | t − t |≤ m T − B n (cid:104) cum( I { X t ,T ≤ q ϑ ( τ ) } , I { X t ,T ≤ q ϑ ( τ ) } ) + O (cid:16)(cid:104) nT (cid:105) (cid:17)(cid:105) O (cid:16)(cid:104) nT B d − n + B n /T (cid:105) (cid:17) = O (cid:16) n − + n /T (cid:17) O (cid:16)(cid:104) nT B d − n + B n /T (cid:105) (cid:17) . Turning to A ,n , note that E | A ,n | = E (cid:12)(cid:12)(cid:12) (cid:88) | k |≤ n − K n ( k ) e − iωk × n (cid:88) | t − t |≤ m T − B n (cid:16) I { X t + k,T ≤ q ϑ ( τ ) } − F t + k ; T ( q ϑ ( τ )) (cid:17)(cid:16) τ − F t ; T ( q ϑ ( τ )) (cid:17)(cid:12)(cid:12)(cid:12) = 1 n (cid:88) | k |≤ n − | k |≤ n − K n ( k ) K n ( k ) e − iω ( k − k ) (cid:88) | t − t |≤ m T − B n | t − t |≤ m T − B n (cid:34)(cid:16) τ − F t ; T ( q ϑ ( τ )) (cid:17) (cid:16) τ − F t ; T ( q ϑ ( τ )) (cid:17) cum (cid:16) I { X t k ,T ≤ q ϑ ( τ ) } , I { X t k ,T ≤ q ϑ ( τ ) } (cid:17)(cid:35) ≤ n (cid:88) | k |≤ n − | k |≤ n − (cid:88) | t − t |≤ m T − B n | t − t |≤ m T − B n O (cid:16) n T (cid:17)(cid:12)(cid:12)(cid:12) cum (cid:16) I { X t k ,T ≤ q ϑ ( τ ) } , I { X t k ,T ≤ q ϑ ( τ ) } (cid:17)(cid:12)(cid:12)(cid:12) ≤ O (1 /T ) (cid:88) | t − t |≤ m T − B n (cid:88) | k |≤ B n O ( B n ) (cid:88) m ∈ Z κ ( m ) = O ( nB n /T ) = o ( B n /n ) , where the second inequality follows from the fact that, for each ﬁxed value of t , k andeach m ∈ Z there are at most O ( B n ) values of k , t such that t + k − t − k = m andAssumption (A3)(iii), which implies that the sum over m is ﬁnite. A.6.3 Proof of (A.8)

To start with, let us state the following lemma.

Lemma A.6.1.

For any a n → ∞ such that a n /n = o (1) , B n /a n = o (1) we have, for all ω , ω in { ω, − ω } , sup | s − t |≤ m T (cid:12)(cid:12)(cid:12) E (cid:104) s + a n (cid:88) t = s W t ,T ( ω ) s + a n (cid:88) t = s W t ,T ( ω ) (cid:105) − π a n B n n (cid:90) K ( u ) du (cid:16) I { ω = ω } f ϑ ( ω , τ , τ ) f ϑ ( − ω , τ , τ )+ I { ω = − ω } f ϑ ( ω , τ , τ ) f ϑ ( ω , τ , τ ) (cid:17)(cid:12)(cid:12)(cid:12) = o ( B n a n /n ) . Proof.

Observe that E (cid:104) s + a n (cid:88) t = s W t ,T ( ω ) s + a n (cid:88) t = s W t ,T ( ω ) (cid:105) = 14 π (cid:88) | k |≤ B n (cid:88) | k |≤ B n K n ( k ) K n ( k )e − i( k ω + k ω ) n s + a n (cid:88) t = s s + a n (cid:88) t = s Cov (cid:0) Y t ,τ Y t + k ,τ , Y t ,τ Y t + k ,τ (cid:1)

76 14 π (cid:88) | k |≤ B n (cid:88) | k |≤ B n K n ( k ) K n ( k )e − i( k ω + k ω ) × n s + a n (cid:88) t = s s + a n (cid:88) t = s (cid:104) cum (cid:0) Y t ,τ , Y t + k ,τ , Y t ,τ , Y t + k ,τ (cid:1) + cum (cid:0) Y t ,τ , Y t ,τ (cid:1) cum (cid:0) Y t + k ,τ , Y t + k ,τ (cid:1) + cum (cid:0) Y t ,τ , Y t + k ,τ (cid:1) cum (cid:0) Y t + k ,τ , Y t ,τ (cid:1)(cid:105) =: C ,n + D ,n + D ,n . For C ,n , note that | C ,n | ≤ π (cid:107) K n (cid:107) ∞ n (cid:88) | k |≤ B n (cid:88) | k |≤ B n s + a n (cid:88) t = s s + a n (cid:88) t = s | cum (cid:0) Y t ,τ , Y t + k ,τ , Y t ,τ , Y t + k ,τ (cid:1) |≤ π (cid:107) K n (cid:107) ∞ n s + a n (cid:88) t = s T (cid:88) t ,...,t =1 | cum (cid:0) Y t ,τ , Y t ,τ , Y t ,τ , Y t ,τ (cid:1) | = O ( a n /n )since the inner sum is bounded by Assumption (A3)(iii) with p = 4 , uniformly in t . Forthe second inequality, note that ( t , k , t , k ) (cid:55)→ ( t , t + k , t , t + k ) is injective, as ithas ( s , s , s , s ) (cid:55)→ ( s , s − s , s , s − s ) as an inverse.Next, deﬁne Y ϑt,τ := I { X ϑt ≤ q ϑ ( τ ) } − τ and D ϑ ,n := 14 π (cid:88) | k |≤ B n (cid:88) | k |≤ B n K n ( k ) K n ( k )e − i( k ω + k ω ) × n s + a n (cid:88) t = s s + a n (cid:88) t = s cum (cid:0) Y ϑt ,τ , Y ϑt ,τ (cid:1) cum (cid:0) Y ϑt + k ,τ , Y ϑt + k ,τ (cid:1) ,D ϑ ,n := 14 π (cid:88) | k |≤ B n (cid:88) | k |≤ B n K n ( k ) K n ( k )e − i( k ω + k ω ) × n s + a n (cid:88) t = s s + a n (cid:88) t = s cum (cid:0) Y ϑt ,τ , Y ϑt + k ,τ (cid:1) cum (cid:0) Y ϑt + k ,τ , Y ϑt ,τ (cid:1) . After some computation, in view of local stationarity, there exists a constant C such that,77niformly in τ , τ , t , t , k , k , (cid:12)(cid:12)(cid:12) cum (cid:0) Y t ,τ , Y t ,τ (cid:1) cum (cid:0) Y t + k ,τ , Y t + k ,τ (cid:1) − cum (cid:0) Y ϑt ,τ , Y ϑt ,τ (cid:1) cum (cid:0) Y ϑt + k ,τ , Y ϑt + k ,τ (cid:1)(cid:12)(cid:12)(cid:12) ≤ Cn/T.

Note that sup t ,k s + a n (cid:88) t = s κ ( t − t ) (cid:88) | k |≤ B n κ ( t + k − ( t + k )) < ∞ , which implies thatsup t ,k s + a n (cid:88) t = s (cid:88) | k |≤ B n min (cid:16) nCT , κ ( t − t ) κ ( t + k − ( t + k )) (cid:17) = o (1) . This, along with assumption (A3)(iii) yields | D ,n − D ϑ ,n |≤ π n s + a n (cid:88) t = s (cid:88) | k |≤ B n s + a n (cid:88) t = s (cid:88) | k |≤ B n min (cid:16) nCT , κ ( t − t ) κ ( t + k − ( t + k )) (cid:17) = o ( a n B n /n ) . A similar argument shows that | D ,n − D ϑ ,n | = o ( a n B n /n ). Summarizing, we have shownthat E (cid:104) s + a n (cid:88) t = s W t ,T ( ω ) s + a n (cid:88) t = s W t ,T ( ω ) (cid:105) = D ϑ ,n + D ϑ ,n + o ( a n B n /n ) . Now, arguments similar to the ones used to show that C ,n = O ( a n /n ) yield D ϑ ,n + D ϑ ,n = E (cid:104) s + a n (cid:88) t = s W ϑt ( ω ) s + a n (cid:88) t = s W ϑt ( ω ) (cid:105) + o ( a n B n /n )78here W ϑt ( ω ) := n − (cid:80) | k |≤ n − K n ( k ) e − iωk ( Y ϑt,τ Y ϑt + k,τ − E [ Y ϑt,τ Y ϑt + k,τ ]). Let h ϑ ( ω, τ , τ ) := 12 π a n (cid:88) | k |≤ n − K n ( k ) e − iωk (cid:88) t ∈ S k ( s,a n ) ( Y ϑt,τ Y ϑt + k,τ − E [ Y ϑt,τ Y ϑt + k,τ ]) . Proceeding as in Rosenblatt (1984), pp. 1173-1174, we have thatVar (cid:16) s + a n (cid:88) t = s W ϑt ( ω ) − a n n h ϑ ( ω, τ , τ ) (cid:17) = O (cid:16) B n n (cid:17) (A.26)uniformly in | s − t | ≤ m T where S k ( s, a n ) := { t : s ≤ t ≤ s + a n , s ≤ t + k ≤ s + a n } .Now, h ϑ ( ω, τ , τ ) is the usual lag-window estimator (centered by its expectation) of the cross-spectrum between ( Y t,τ ) s ≤ t ≤ s + a n and ( Y t,τ ) s ≤ t ≤ s + a n . Thus, classical results from spectraldensity estimation yield E [ h ϑ ( ω , τ , τ ) h ϑ ( ω , τ , τ )] =2 π B n a n (cid:90) K ( u )d u (cid:16) I { ω = ω } f ϑ ( ω , τ , τ ) f ϑ ( ω , τ , τ )+ I { ω = − ω } f ϑ ( ω , τ , τ ) f ϑ ( − ω , τ , τ ) (cid:17) + o (cid:16) B n a n (cid:17) . (A.27)This, combined with (A.26) and the fact that W ϑt,T ( ω ) and h ϑ ( ω , τ , τ ) are centered, entails E (cid:104) s + a n (cid:88) t = s W ϑt ( ω ) s + a n (cid:88) t = s W ϑt ( ω ) (cid:105) − a n n E [ h ϑ ( ω , τ , τ ) h ϑ ( ω , τ , τ )] = O (cid:16) B / n a / n + B n n (cid:17) . Since B n = o ( a n ) by assumption, this with (A.27) completes the proof of Lemma A.6.1.Next. observe that˜ f t ,T ( ω, τ , τ ) − E ˜ f t ,T ( ω, τ , τ )= 12 π (cid:88) | t − t |≤ m T − B n n (cid:88) | k |≤ n − K n ( k ) e − iωk ( Y t,τ Y t + k,τ − E [ Y t,τ Y t + k,τ ])=: (cid:88) | t − t |≤ m T − B n W t,T ( ω ) .

79y construction, the random variables W := { W t,T ( ω ) } | t − t |≤ m T − B n form a triangular arrayof β -mixing random variables with β -mixing coeﬃcients β [ W ] ( j ) ≤ β [ X ] (0 ∨ j − B n ) . To establish the central limit theorem, we will apply the blocking technique from Sec-tion A. . p n , q n . Choose p n , q n such that q n /p n → , B n /q n → , and p n /n → . (A.28)Now decompose˜ f t ,T ( ω, τ , τ ) − E ˜ f t ,T ( ω, τ , τ ) = µ n (cid:88) j =1 (cid:88) t ∈ Γ j W t,T ( ω ) + µ n (cid:88) j =1 (cid:88) t ∈ ∆ j W t,T ( ω ) + (cid:88) t ∈ R W t,T ( ω )=: S n Γ + S n ∆ + S nR , say . By construction, S nR contains at most O ( p n + q n ) summands. Lemma (A.6.1) thus impliesthat Var (cid:16) (cid:88) t ∈ R W t,T ( ω ) (cid:17) = O (cid:16) ( p n + q n ) B n n (cid:17) = o ( B n /n ) , and, therefore, S nR = o P ( B / n /n / ). Next, observe that, by Lemma A.5.3, P (cid:16) B / n n − / (cid:12)(cid:12) S n ∆ (cid:12)(cid:12) ≥ ε (cid:17) = P (cid:16) B / n n − / (cid:12)(cid:12) µ n (cid:88) j =1 (cid:88) t ∈ ∆ j ξ t,T ( ω ) (cid:12)(cid:12) ≥ ε (cid:17) + ( µ n − β [ W ] p n . The second term on the right-hand side of the above expression converges to zero by theassumptions on p n and β [ X ] . To show that the ﬁrst term also converges to zero, observethat, by construction E ξ t,T ( ω ) = 0. The deﬁnition of ξ t,T ( ω ) , combined with Lemma A.6.180nd q n /p n = o (1), yieldsVar( n / B / n µ n (cid:88) j =1 (cid:88) t ∈ ∆ j ξ t,T ( ω )) = nB n µ n (cid:88) j =1 Var( (cid:88) t ∈ ∆ j W t,T ( ω )) = nB n O (cid:16) µ n B n q n n (cid:17) = o (1) . Thus it remains to show that n / B / n S n Γ converges in distribution. To this end, observe that,for any measurable set A, we have, by Lemma A.5.3 and the assumptions on β [ W ] , P (cid:16) n / B / n S n Γ ∈ A (cid:17) = P (cid:16) n / B / n µ n (cid:88) j =1 (cid:88) t ∈ Γ j ξ t,T ( ω ) ∈ A (cid:17) + o (1) . Thus, it suﬃces to establish the weak convergence of n / B / n (cid:80) µ n j =1 (cid:80) t ∈ Γ j ξ t,T ( ω ). To do so,consider the triangular array of independent random vectors (cid:16) n / B / n (cid:88) t ∈ Γ j (cid:0) (cid:60) ξ t,T ( ω ) , (cid:61) ξ t,T ( ω ) (cid:1) T (cid:17) j =1 ,...,µ n . Applying the Cram´er-Wold device, let us show that for any λ , λ ∈ R such that | λ | + | λ | (cid:54) = 0 , the triangular array of independent random variables (cid:16) n / B / n (cid:88) t ∈ Γ j λ (cid:60) ξ t,T ( ω ) + λ (cid:61) ξ t,T ( ω ) (cid:17) j =1 ,...,µ n satisﬁes the Lyapunov condition. By construction ( W t,T ( ω )) t ∈ Γ j d = ( ξ t,T ( ω )) t ∈ Γ j , so that E (cid:104)(cid:16) (cid:88) t ∈ Γ j (cid:60) ξ t,T ( ω ) (cid:17) (cid:105) = E (cid:104)(cid:16) (cid:88) t ∈ Γ j (cid:60) W t,T ( ω ) (cid:17) (cid:105) = 3 (cid:16) Var( (cid:88) t ∈ Γ j (cid:60) W t,T ( ω )) (cid:17) + (cid:88) t ,...,t ∈ Γ j cum( (cid:60) W t ,T ( ω ) , ..., (cid:60) W t ,T ( ω )) . A similar representation holds for the imaginary parts of W t,T . Similar arguments as on81ages 1177-1178 of Rosenblatt (1984) show thatsup j (cid:88) t ,...,t ∈ Γ j (cid:16) | cum( (cid:60) W t ,T ( ω ) , ..., (cid:60) W t ,T ( ω )) | + | cum( (cid:61) W t ,T ( ω ) , ..., (cid:61) W t ,T ( ω )) | (cid:17) = O ( q n B n /n ) . (A.29)To verify this, note that, exactly as in Rosenblatt (1984), the cumulants in (A.29) can beexpressed in terms of cumulants of the random variables Y t,τ j , j = 1 , t ∈ Γ j by summationover indecomposable partititions. Apply (A3)(iii) to bound those cumulants uniformly, thenfollow the same arguments as in Rosenblatt (1984) to bound the sums. Then (A.29) entails,for any λ , λ ∈ R , µ n (cid:88) j =1 E (cid:104)(cid:16) (cid:88) t ∈ Γ j λ (cid:60) ξ t,T ( ω ) + λ (cid:61) ξ t,T ( ω ) (cid:17) (cid:105) = O ( µ n q n B n /n )and, by Lemma (A.6.1), for any λ , λ ∈ R with | λ | + | λ | (cid:54) = 0, (cid:16) µ n (cid:88) j =1 Var( (cid:88) t ∈ Γ j λ (cid:60) ξ t,T ( ω ) + λ (cid:61) ξ t,T ( ω )) (cid:17) ≥ c ( λ , λ ) µ n q n B n /n for some c ( λ , λ ) > n . Thus the conditions of Lyapunovs centrallimit theorem are satisﬁed as µ n → ∞ . This completes the proof of (A.8). A.6.4 Proof of (A.9)

The proof of (A.9) relies on the following lemma (see Priestley (1981), page 459 for similararguments). 82 emma A.6.2.

Uniformly in | u − ϑ | ≤ n/T and x, y in a neighborhood of τ , τ , we have π (cid:88) | k |≤ n − K n ( k ) e − iωk γ uk ( x, y ) = f u ( ω, x, y ) − C K ( r ) B − rn d ( r ) ω f u ( ω, x, y ) + o ( B − rn ) . Proof.

Choose some L n → ∞ such that L n /B n →

0. Then,12 π (cid:88) | k |≤ n − K n ( k ) e − iωk γ uk ( x, y ) − f u ( ω, x, y )= B − rn π (cid:88) | k |≤ L n K ( k/B n ) − | k/B n | r | k | r e − iωk γ uk ( x, y )+ B − rn π (cid:88) B n ≥| k | >L n K ( k/B n ) − | k/B n | r | k | r e − iωk γ uk ( x, y ) − B − rn π (cid:88) | k | >B n e − iωk B rn | k | r | k | r γ uk ( x, y ) . By Assumption (K) and (A3)(ii) sup v | K ( v ) − || v | r and (cid:80) k ∈ Z | k | r | γ uk ( x, y ) | are bounded. There-fore, the last term in the above expression is O ( B − rn ) (cid:88) | k | >B n | k | r | γ uk ( x, y ) | = o ( B − rn ) , and the second term is (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) B − rn π (cid:88) B n ≥| k | >L n K ( k/B n ) − | k/B n | r | k | r e − iωk γ uk ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ( B − rn ) sup v | K ( v ) − || v | r (cid:88) B n ≥| k | >L n | k | r | γ uk ( x, y ) | = o ( B − rn ) , since L n → ∞ . Finally, for the ﬁrst term, observe that12 π B − rn (cid:88) | k |≤ L n K ( k/B n ) − | k/B n | r | k | r e − iωk γ uk ( x, y ) + C K ( r ) B − rn d ( r ) ω f u ( ω, x, y ) (A.30)83 12 π B − rn (cid:88) | k |≤ L n (cid:16) K ( k/B n ) − | k/B n | r + C K ( r ) (cid:17) | k | r e − iωk γ uk ( x, y )+ 12 π C K ( r ) B − rn (cid:88) | k | >L n | k | r e − iωk γ uk ( x, y ) . The ﬁrst term in the right-hand side of (A.30) is of order o ( B − rn ) since, by Assumption (K), L n /B n → K ( k/B n ) − | k/B n | r → − C K ( r ) and | k | r | γ uk ( x, y ) | is absolutely summable, whilethe second term is o ( B − rn ) since L n → ∞ and | k | r | γ uk ( x, y ) | is absolutely summable. Notethat, under the assumptions made, all arguments hold uniformly in u, x, y . This completesthe proof.We can now prove (A.9). First, note that E ˜ f t ,T ( ω, τ , τ ) = 12 π n (cid:88) | k |≤ n − K n ( k ) e − iωk × (cid:88) | t − t |≤ m T − B n γ t/Tk ( G t/T ( q ϑ ( τ )) , G t/T ( q ϑ ( τ ))) + O ( B n /T ) . Next, observe that by A.6.2 and the continuity of ( u, x, y ) (cid:55)→ d ( r ) ω f u ( ω, x, y ) , sup | t − t |≤ m T − B n (cid:12)(cid:12)(cid:12) d ( r ) ω f t/T ( ω, G t/T ( q ϑ ( τ )) , G t/T ( q ϑ ( τ ))) − d ( r ) ω f ϑ ( ω, τ , τ ) (cid:12)(cid:12)(cid:12) = o (1)since G u ( q ϑ ( τ )) → τ for u → ϑ . Thus E ˜ f t ,T ( ω, τ , τ ) = − C K ( r ) B − rn d ( r ) ω f ϑ ( ω, τ , τ )+ 1 n (cid:88) | t − t |≤ m T − B n f t/T ( ω, G t/T ( q ϑ ( τ )) , G t/T ( q ϑ ( τ ))) + o ( B − rn ) . On the other hand,1 n (cid:88) | t − t |≤ m T − B n f t/T ( ω, G t/T ( q ϑ ( τ )) , G t/T ( q ϑ ( τ )))84 T m T (cid:90) m T /T − m T /T f ϑ + u ( ω, G ϑ + u ( q ϑ ( τ )) , G ϑ + u ( q ϑ ( τ ))) + O (1 /n )= f ϑ ( ω, q ϑ ( τ ) , q ϑ ( τ ))+ n T ∂ ∂u f u ( ω, G u ( q ϑ ( τ )) , G u ( q ϑ ( τ ))) (cid:12)(cid:12)(cid:12) u = ϑ + O (1 /n ) + o ( n /T ) ..