Change-point tests for the tail parameter of Long Memory Stochastic Volatility time series
CChange-point tests for the tail parameter of
Long Memory Stochastic Volatility timeseries
Annika Betken ∗ Faculty of Mathematics, Ruhr-Universit¨at BochumandDavide Giraudo ∗ Faculty of Mathematics, Ruhr-Universit¨at BochumandRafa(cid:32)l KulikDepartment of Mathematics and Statistics, University of OttawaJune 5, 2020
Abstract
We consider a change-point test based on the Hill estimator to test for structuralchanges in the tail index of Long Memory Stochastic Volatility time series. Inorder to determine the asymptotic distribution of the corresponding test statistic,we prove a uniform reduction principle for the tail empirical process in a two-parameter Skorohod space. It is shown that such a process displays a dichotomousbehavior according to an interplay between the Hurst parameter, i.e., a parametercharacterizing the dependence in the data, and the tail index. Our theoreticalresults are accompanied by simulation studies and the analysis of financial timeseries with regard to structural changes in the tail index.
Keywords: stochastic volatility; long-range dependence; change-point tests; tail empiricalprocess; heavy tails; chaining ∗ Research supported by the German National Academic Foundation and Collaborative ResearchCenter SFB 823
Statistical modelling of nonlinear dynamic processes a r X i v : . [ m a t h . S T ] J un Introduction and motivation
The tail behavior of the marginal distribution of time series is of major relevance forstatistics in applied sciences such as econometrics and hydrology, where heavy-taileddata occurs frequently. More precisely, time series from finance such as the log returnsof exchange rates and stock market indices display heavy tails; see Mandelbrot (1963).Furthermore, drastic events like the financial crisis in 2008 substantiate the importanceof studying time series models that underlie financial data. Against this background,the identification of changes in the tail behavior of data-generating stochastic processes,that result in an increase or decrease in the probability of extreme events, is of utmostinterest. In particular, the analysis of the tail behavior of financial data may pave the wayfor a corresponding adjustment of risk management for capital investments and, therefore,prevent from huge capital losses. Indeed, there is empirical evidence that the tail behaviorof financial time series may change over time: Quintos et al. (2001) identify changes inthe tail of Asian stock market indices, Galbraith and Zernov (2004) find evidence forchanges in the tail behavior of returns on U.S. equities, and Werner and Upper (2004)detect structural breaks in high-frequency data of Bund future returns.
Let X j , j ∈ N , be a stationary time series whose marginal tail distribution function ¯ F is regularly varying with index − α , α >
0, i.e., P ( X > x ) = x − α L ( x ), where L is slowlyvarying at infinity. Since the tail behavior of X j , j ∈ N , is primarily determined by thevalue of the tail index α , identifying a change in the tail of data-generating processescorresponds to testing for a change-point in this parameter.In particular, this means that, given a set of observations X , . . . , X n with P ( X j > x ) = x − α j L ( x ), j = 1 , . . . , n , we aim at deciding on the testing problem ( H, A ) with H : α = · · · = α n and A : α = · · · = α k (cid:54) = α k +1 = · · · = α n for some k ∈ { , . . . , n − } . α . For some general results on tailindex estimation see Drees (1998a) and Drees (1998b). In this article, we focus on twoestimators that are motivated by the fact that for a random variable X with tail index α lim u →∞ E (cid:20) log (cid:18) Xu (cid:19) | X > u (cid:21) = lim u →∞ E (cid:2) log (cid:0) Xu (cid:1) { X > u } (cid:3) P ( X > u ) = 1 α = ·· γ. When we are given a set of observations X , . . . , X n , an approximation of the unknowndistribution of X by its empirical analogue gives the following estimator for the tail index: (cid:98) γ ·· = 1 (cid:80) nj =1 { X j > u n } n (cid:88) j =1 log (cid:18) X j u n (cid:19) { X j > u n } , (1)where u n , n ∈ N , is a sequence with u n → ∞ and n ¯ F ( u n ) → ∞ . Replacing thedeterministic levels u n in the formula for (cid:98) γ by X n : n − k n for some k n , 1 (cid:54) k n (cid:54) n −
1, where X n : n (cid:62) X n : n − (cid:62) . . . (cid:62) X n :1 are the order statistics of the sample X , . . . , X n , yields theHill estimator (cid:98) γ Hill = 1 k n k n (cid:88) i =1 log (cid:18) X n : n − i +1 X n : n − k n (cid:19) . As the most popular estimator for the tail index, established in Hill (1975), the Hillestimator has been widely studied in the literature. Its limiting distribution was obtainedunder various model assumptions, including linear processes (Resnick and St˘aric˘a (1997)), β -mixing processes (Drees (2000)), and Long Memory Stochastic Volatility models (Kulikand Soulier (2011)). The first article that establishes a theory for change-point tests thatare based on the Hill estimator seems to be Quintos et al. (2001). While Quintos et al.(2001) consider independent, identically distributed observations, ARCH- and GARCH-type processes, Kim and Lee (2011) and Kim and Lee (2012) extend their results to β -mixing processes and residual-based change-point tests for AR( p ) processes with heavy-tailed innovations. In contrast, we study change-point tests for the tail index of LongMemory Stochastic Volatility time series based on the two estimators (cid:98) γ and (cid:98) γ Hill . In fact,our results are the first to consider the change-point problem for stochastic volatilitymodels and time series with long-range dependence.To motivate the design of test statistics for deciding on the change-point problem(
H, A ), we temporarily assume that the change-point location is known, i.e., for a given k ∈ { , . . . , n − } we consider the testing problem ( H, A k ) with A k : α = · · · = α k (cid:54) = α k +1 = · · · = α n . H, A k ), we compare an estimation (cid:98) γ k ofthe tail index based on the observations X , . . . , X k to an estimation (cid:98) γ n of the tail indexbased on the whole sample X , . . . , X n . This idea leads to studying the following teststatistic Γ k,n = kn (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ k (cid:98) γ n − (cid:12)(cid:12)(cid:12)(cid:12) . Under the assumption that the change-point location is unknown under the alter-native, it seems natural to consider the statistic Γ k,n for every potential change-pointlocation k and to decide in favor of the alternative hypothesis A if the maximum of itsvalues exceeds a predefined threshold. As a result, a change-point test for the testingproblem ( H, A ) that rests upon the estimator (cid:98) γ defined by (1) bases test decisions on thevalues of the statistic Γ n ·· = sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ (cid:98) nt (cid:99) (cid:98) γ n − (cid:12)(cid:12)(cid:12)(cid:12) (2)with t ∈ (0 ,
1) and with the sequential version of (cid:98) γ defined by (cid:98) γ (cid:98) nt (cid:99) ·· = 1 (cid:80) (cid:98) nt (cid:99) j =1 { X j > u n } (cid:98) nt (cid:99) (cid:88) j =1 log (cid:18) X j u n (cid:19) { X j > u n } . (3)Likewise, a test statistic based on the Hill estimator is given by (cid:101) Γ n ·· = sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ Hill ( t ) (cid:98) γ Hill (1) − (cid:12)(cid:12)(cid:12)(cid:12) with the sequential version of (cid:98) γ Hill defined by (cid:98) γ Hill ( t ) ·· = 1 (cid:98) k n t (cid:99) (cid:98) k n t (cid:99) (cid:88) i =1 log (cid:32) X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− i +1 X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− k (cid:98) nt (cid:99) (cid:33) . In this context, the most comprehensive theory for change-point tests is presented inHoga (2017). The author considers a number of test statistics based on different tail indexestimators and derives their asymptotic distributions under the assumption of β -mixingdata generating processes.In the following, we derive the asymptotic distribution of both estimators, i.e., (cid:98) γ (cid:98) nt (cid:99) and (cid:98) γ Hill ( t ), and the corresponding tests statistics, i.e., Γ n and (cid:101) Γ n , under the hypothesisof stationary time series data. For this purpose, we first prove a limit theorem for thetail empirical process of Long Memory Stochastic Volatility time series in two param-eters. This limit theorem does not necessarily relate to the change-point context. It4an therefore be considered of independent interest and, thus, as the main theoreticalresult of our work. Our theoretical results are accompanied by simulation studies. Asan empirical application of our tests, we consider Standard & Poor’s 500 daily closingindex covering the period from January 2008 to December 2008, the year of the financialcrisis. We identify a change in the data at exactly one day after Lehman Brothers filedfor bankruptcy protection, an event which is thought to have played a major role in theunfolding of the crisis in 2007 - 2008. In order to derive the limit distribution of the tail estimators (cid:98) γ (cid:98) nt (cid:99) and (cid:98) γ Hill ( t ), parametrizedin t , and the corresponding test statistics Γ n and (cid:101) Γ n , it is crucial to note that (cid:98) γ (cid:98) nt (cid:99) = 1 (cid:80) (cid:98) nt (cid:99) j =1 { X j > u n } (cid:98) nt (cid:99) (cid:88) j =1 log (cid:18) X j u n (cid:19) { X j > u n } = 1 (cid:101) T n (1 , t ) (cid:90) ∞ s − (cid:101) T n ( s, t ) ds , (4)where (cid:101) T n ( s, t ) = 1 n ¯ F ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n s } . As a result, asymptotics of the considered statistics can be derived from a limit theoremfor the two-parameter tail empirical process e n ( s, t ) ·· = (cid:110) (cid:101) T n ( s, t ) − T ( s, t ) (cid:111) , s ∈ [1 , ∞ ] , t ∈ [0 , , (5)where T ( s, t ) does not correspond to the mean of (cid:101) T n ( s, t ), but rather to the limit of thatmean, i.e., to T ( s, t ) ·· = ts − α . (6)Among others, the tail empirical process in one parameter, i.e., e n ( s, s ∈ [1 , ∞ ],has previously been studied in Mason (1988), Einmahl (1990), and Einmahl (1992) forindependent, identically distributed observations, in Rootz´en (2009) for absolutely regularprocesses, and in Kulik and Soulier (2011) for Long Memory Stochastic Volatility timeseries. For the latter, the convergence of the two-parameter tail empirical process will bediscussed in Section 2.2. 5 .3 Long Memory Stochastic Volatility model A phenomenon that is often encountered in the context of financial time series corre-sponds to the observation that the observations seem to be uncorrelated, whereas theirabsolute values or higher moments tend to be highly correlated. Another characteristic offinancial time series is the occurrence of heavy tails. In particular, the distribution of theconsidered data often exhibits tails that are heavier than those of a normal distribution.The previously described features of financial data can be covered by stochastic volatilitymodels.
Stochastic volatility model
The Long Memory Stochastic Volatility model that is taken as a basis of the theoreti-cal results established in this article can be considered as a generalization of stochasticvolatility models considered, for example, in Taylor (1986). Initially, this model had beenintroduced by Breidt et al. (1998) and, independently, by Harvey (2002). An overviewof stochastic volatility models with long-range dependence and their basic properties isgiven in Deo et al. (2006) and in Hurvich and Soulier (2009).Stochastic volatility time series X j , j ∈ N , are typically defined via X j = Z j ε j with Z j = exp (cid:18) Y j (cid:19) , (7)where ε j , j ∈ N , is a sequence of independent, identically distributed random variableswith mean 0, and Y j , j ∈ N , is a Gaussian process, independent of ε j , j ∈ N .While these models are often restricted to modeling a relatively fast decay of de-pendence in Y j , j ∈ N , the so-called Long Memory Stochastic Volatility model allowsfor long-range dependence. In what follows, we will specify a corresponding dependencestructure for Y j , j ∈ N . Subordinated Gaussian processes
The rate of decay of the autocovariance function is crucial to the definition of long-rangedependence in time series.
Definition 1.1.
A (second-order) stationary, real-valued time series Y j , j ∈ Z , is called ong-range dependent if its autocovariance function γ satisfies γ Y ( k ) ·· = Cov ( Y , Y k +1 ) ∼ k − D L γ ( k ) , as k → ∞ ,with D ∈ (0 , for some slowly varying function L γ . We refer to D as long-rangedependence (LRD) parameter; see Pipiras and Taqqu (2017), p. 17. We will focus our considerations on long-range dependent subordinated Gaussian timeseries.
Definition 1.2.
Let Y j , j ∈ N , be a Gaussian process. A process Z j , j ∈ N , satisfying Z j = G ( Y j ) for some measurable function G : R −→ R is called subordinated Gaussianprocess. Remark 1.3.
For any particular distribution function F , an appropriate choice of thetransformation G in Definition 1.2 yields a subordinated Gaussian process with marginaldistribution F . Moreover, there exist algorithms for generating Gaussian processes that,after suitable transformation, yield subordinated Gaussian processes with marginal dis-tribution F and a predefined covariance structure; see Pipiras and Taqqu (2017), Section5.8.4. To that effect, subordinated Gaussian processes constitute a comprehensive modelfor long-range dependent time series.The subordinated random variables Z j = G ( Y j ), j ∈ N , can be considered as elementsof the Hilbert space L ·· = L ( R , ϕ ( x ) dx ), i.e., the space of all measurable, real-valuedfunctions which are square-integrable with respect to the measure ϕ ( x ) dx associated withthe standard normal density function ϕ , equipped with the inner product (cid:104) G , G (cid:105) L ·· = (cid:90) ∞−∞ G ( x ) G ( x ) ϕ ( x ) dx = E [ G ( Y ) G ( Y )] , where G , G ∈ L ( R , ϕ ( x ) dx ) and Y denotes a standard normally distributed randomvariable. In order to characterize the dependence structure of subordinated Gaussianprocesses, we consider their expansion in Hermite polynomials. Definition 1.4.
For n (cid:62) , the Hermite polynomial of order n is defined by H n ( x ) = ( − n e x d n dx n e − x , x ∈ R . L . In par-ticular, it holds that (cid:104) H n , H m (cid:105) L = n ! if n = m ,0 if n (cid:54) = m .As a result, every G ∈ L ( R , ϕ ( x ) dx ) has an expansion in Hermite polynomials, i.e., for G ∈ L ( R , ϕ ( x ) dx ) and Y standard normally distributed, we have G ( Y ) L = ∞ (cid:88) r =0 J r ( G ) r ! H r ( Y ) , i.e. , lim n →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G ( Y ) − n (cid:88) r =0 J r ( G ) r ! H r ( Y ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L = 0 , (8)where (cid:107) · (cid:107) L denotes the norm induced by the inner product (cid:104)· , ·(cid:105) L , and the so-called Hermite coefficients J r ( G ), r (cid:62)
1, are given by J r ( G ) := (cid:104) G, H r (cid:105) L = E G ( Y ) H r ( Y ) , r (cid:62) . Given the Hermite expansion (8), it is possible to characterize the dependence struc-ture of subordinated Gaussian time series G ( Y j ), j ∈ N . Under the assumption that theGaussian sequence Y j , j ∈ N , is stationary and that G is a one-to-one function, the be-havior of the autocorrelations of the transformed process is completely determined by thedependence structure of the underlying process. However, this is not the case in general.In fact, it holds that Cov ( G ( Y ) , G ( Y k +1 )) = ∞ (cid:88) r =1 J r ( G ) r ! ( γ Y ( k )) r , (9)where γ Y ( k ) denotes the autocovariance function of Y n , n ∈ N ; see Pipiras and Taqqu(2017).Under the assumption that, as k tends to ∞ , γ Y ( k ) converges to 0 with a certain rate,the asymptotically dominating term in the series (9) is the summand corresponding tothe smallest integer r for which the Hermite coefficient J r ( G ) is non-zero. This index,which decisively depends on G , is called Hermite rank. Definition 1.5.
Let G ∈ L ( R , ϕ ( x ) dx ) , E [ G ( Y )] = 0 for standard normally distributed Y and let J r ( G ) , r (cid:62) , be the Hermite coefficients in the Hermite expansion of G . Thesmallest index k (cid:62) for which J k ( G ) (cid:54) = 0 is called the Hermite rank of G , i.e., r ·· = min { k (cid:62) J k ( G ) (cid:54) = 0 } .
8t follows from (9) that subordination of long-range dependent Gaussian time seriespotentially generates time series whose autocovariances decay faster than the autoco-variances of the underlying Gaussian process. In some cases, the subordinated timeseries is long-range dependent as well, in other cases subordination may even yield short-range dependence. Given that Cov( Y , Y k +1 ) ∼ k − D L ( k ), as k → ∞ , and given that G ∈ L ( R , ϕ ( x ) dx ) is a function with Hermite rank r , we haveCov( G ( Y ) , G ( Y k +1 )) ∼ J r ( G ) r ! k − Dr L rγ ( k ) , as k → ∞ .It immediately follows that subordinated Gaussian time series G ( Y j ), j ∈ N , are long-range dependent with LRD parameter D G ·· = Dr and slowly-varying function L G ( k ) = J r ( G ) r ! L rγ ( k ) whenever Dr <
Definition 1.6.
Let the data generating process X j , j ∈ N , satisfy X j = Z j ε j , j ∈ N , where ε j , j ∈ N , is a sequence of independent, identically distributed random variableswith mean , and Z j , j ∈ N , is a long-range dependent subordinated Gaussian processwith Z j = σ ( Y j ) , j ∈ N , for some stationary, long-range dependent Gaussian process Y j , j ∈ N , with LRD parameter D and a non-negative measurable function σ (not equal to ). More precisely, assume that Y j , j ∈ N , admits a linear representation with respect toan independent, standard normally distributed sequence η k , k ∈ Z , i.e., Y j = ∞ (cid:88) k =1 c k η j − k , j ∈ N , with (cid:80) ∞ k =1 c k = 1 . Furthermore, suppose that ( ε j , η j ) , j ∈ Z , is a sequence of independent,identically distributed random vectors. A sequence of random variables X j , j ∈ N , whichsatisfies the previous assumption is called a Long Memory Stochastic Volatility (LMSV)time series. Remark 1.7.
The model assumptions generalize the preceding concepts of stochasticvolatility models with long-range dependence by allowing for general subordinated Gaus-sian sequences Z j , j ∈ N , and dependence between Y j , j ∈ N , and ε j , j ∈ N . Instead9f claiming mutual independence of Y j , j ∈ N , and ε j , j ∈ N , the sequence of randomvectors ( η j , ε j ) is assumed to be independent. In particular, this implies that for a fixedindex j , the random variables Y j and ε j are independent, while Y j may depend on ε i , i < j . In many cases, an LMSV model incorporating this dependence structure is re-ferred to as LMSV with leverage , as it allows for so-called leverage effects in financial timeseries. Not taking account of leverage, Definition 1.6 corresponds to the LMSV modelconsidered in Kulik and Soulier (2011), while a similar model with leverage is consideredin Bilayi-Biakana et al. (2019).It can be shown that random variables X j , j ∈ N , satisfying Definition 1.6 are un-correlated, while their squares inherit the dependence structure from the subordinatedGaussian sequence Z j , j ∈ N . Moreover, X j , j ∈ N , inherits the tail behavior from thesequence ε j , j ∈ N , if the marginal distribution of the random variables ε j , j ∈ N , hasa regularly varying right tail, i.e., F ε ( x ) ·· = P ( ε > x ) = x − α L ( x ) for some α > L , and if E (cid:2) σ α + δ ( Y ) (cid:3) < ∞ for some δ >
0. More precisely,under these assumptions the following asymptotic equivalence holds: P ( X > x ) ∼ E [ σ α ( Y )] P ( ε > x ) , as x → ∞ .This result is known as Breiman’s Lemma; see Breiman (1965). On this account, itfollows that Definition 1.6 is suited for modeling the previously described characteristicfeatures of financial time series. In all following sections, we will therefore assume thatthe data-generating process X j , j ∈ N , corresponds to a LMSV time series specified byDefinition 1.6. Equipped with the introductory remarks and definitions, we are in a position to discussthe structure of the paper. In Section 2 we state the technical assumptions that are neededfor our theoretical results. These are followed by the main theorem on convergence of thetwo-parameter tail empirical process (Theorem 2.6). Convergence of estimators of thetail index (Corollaries 2.7 and 2.8) and the test statistics (Corollaries 2.10 and 2.11) areimmediate consequences. Simulation studies are presented in Section 3, while real-dataanalysis can be found in Section 4. All the proofs are included in Section 5. In orderto establish convergence of the two-parameter tail empirical process, we decompose it10nto a martingale and a long-range dependent part. The latter is dealt with in Section5.1.2. For the former, we establish finite dimensional convergence (Section 5.1.3) usingclassical tools from martingale theory, while tightness of the two-parameter martingalepart is handled by chaining. This is a theoretical novelty in the present context since themethods used in related papers are not suitable (the method used in Kulik and Soulier(2011) cannot be applied to models with leverage, while the approach in Bilayi-Biakanaet al. (2019) is not well-suited for two-parameter processes).
In this section, we establish the assumptions guaranteeing convergence of the two-parameterempirical process for LMSV time series. Initially, we specify the LMSV model yieldingthe main assumptions for the theory:
Assumption 2.1 (Main Assumptions) . Let X j = Z j ε j , j ∈ N , satisfy Definition 1.6with Z j = σ ( Y j ) , j ∈ N , for some stationary, long-range dependent Gaussian process Y j , j ∈ N , with autocovariance function γ Y ( k ) ·· = Cov ( Y , Y k +1 ) ∼ k − D L γ ( k ) , as k → ∞ ,and some independent, identically distributed sequence ε j , j ∈ N , with regularly varyingright tail, i.e., F ε ( x ) ·· = P ( ε > x ) = x − α L ( x ) for some α > and a slowly varyingfunction L . Moreover, let r denote the Hermite rank of Ψ( y ) ·· = σ α ( y ) and assume that r < /D . In the following, we list some technical conditions that characterize the behavior ofthe slowly varying function L and the moments of σ ( Y ). For this, we introduce anothercondition on the distribution function F ε . Definition 2.2 (Second order regular variation) . Let F ε ( x ) = x − α L ( x ) for some α > and some slowly varying function L that is represented by L ( x ) = c exp (cid:18)(cid:90) x η ( u ) u du (cid:19) for some constant c and a measurable function η . Furthermore, we assume that thereexists a bounded, decreasing function η ∗ on [0 , ∞ ) , regularly varying at infinity with pa- ameter ρ (cid:62) , i.e., η ∗ ( x ) = x − ρ L η ∗ ( x ) , such that | η ( s ) | (cid:54) Cη ∗ ( s ) , for some constant C and for all s (cid:62) . We say that F ε is second order regularly varyingwith tail index α and rate function η ∗ and we write F ε ∈ ( α, η ∗ ) . Second-order regular variation allows to control the difference between F ε and thefunction u (cid:55)→ u − α ; see Lemma 6.1 and 6.2 in the Appendix. Moreover, the specific formof L guarantees continuity of F ε . Assumption 2.3 (Technical Assumptions) . Suppose the main assumptions hold. Addi-tionally, we assume that(TA.1) F ε ∈ ( α, η ∗ ) and η is regularly varying with index ρ ;(TA.2) η ∗ ( u n ) = o (cid:18) d n,r n + √ nF ( u n ) (cid:19) , where d n,r is defined by d n,r = Var (cid:32) n (cid:88) j =1 H r ( Y j ) (cid:33) ∼ c r n − rD L rγ ( n ) , c r = 2 r !(1 − Dr )(2 − Dr ) ; (10) (TA.3) E (cid:2) σ α +max { ρ,α } + ε ( Y ) (cid:3) < ∞ for some ε > ;(TA.4) E (cid:2) ( σ ( Y )) − (cid:3) < ∞ . Remark 2.4.
Assumption (TA.2) handles the bias which is created by centering the tailempirical process not by its mean, but rather by the limit of that mean.
Example 2.5.
The most commonly used second order assumption is that L ( x ) = c exp (cid:18)(cid:90) x η ( u ) u du (cid:19) with η ( s ) = s − αβ for some β >
0. It then holds that F ε ( s ) = C (cid:0) s − α + O ( s − ( α ( β +1)) ) (cid:1) ,for s → ∞ , for some constant c >
0. Furthermore, we havesup s (cid:62) s (cid:12)(cid:12)(cid:12)(cid:12) F ε ( u n s ) F ε ( u n ) − s − α (cid:12)(cid:12)(cid:12)(cid:12) = O ( u − αβn ) . In this case, (TA.2) can be replaced by the assumption u − αβn = o (cid:18) d n,r n + √ nF ( u n ) (cid:19) .12 .2 Convergence of the tail empirical process Recall that the tail empirical in two parameters is defined by e n ( s, t ) ·· = 1 n ¯ F ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n s } − ts − α , s ∈ [1 , ∞ ] , t ∈ [0 , . The following theorem establishes a characterization of its limit.
Theorem 2.6.
Let X j , j ∈ N , be a stationary time series with marginal tail distributionfunction ¯ F . Moreover, assume that Assumptions 2.1 and 2.3 hold.1. If nd n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , nd n,r e n ( s, t ) ⇒ s − α E [ σ α ( Y )] J r (Ψ) r ! Z r,H ( t ) , (11) where Ψ( y ) = σ α ( y ) , r is the Hermite rank of Ψ , Z r,H is an r -th order Hermiteprocess, H = 1 − rD , and d n,r is defined in (10).2. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , (cid:113) nF ( u n ) e n ( s, t ) ⇒ B ( s − α , t ) , (12) where B denotes a standard Brownian sheet.The convergence holds in a two-parameter Skorohod space, i.e., ⇒ denotes weak conver-gence in D ([1 , ∞ ] × [0 , . The dichotomy of the limiting process is explained by the decomposition of the tailempirical process into the sum of a martingale and a partial sum of long-range dependentrandom variables, which can be viewed as a special case of Doob’s decomposition; see Sec-tion 5.1.1. If nd n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , the martingale part in the decomposition becomesnegligible, such that the limiting process arises from the convergence of the long-rangedependent part. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , the long-range dependent part in the decompo-sition becomes negligible, such that the limiting process arises from the convergence ofthe martingale part. The same decomposition has already been employed in Kulik andSoulier (2011), Betken and Kulik (2019), and Bilayi-Biakana et al. (2019).13 .3 Convergence of the tail estimators Recall that the considered tail index estimators are defined by (cid:98) γ (cid:98) nt (cid:99) ·· = 1 (cid:80) (cid:98) nt (cid:99) j =1 { X j > u n } (cid:98) nt (cid:99) (cid:88) j =1 log (cid:18) X j u n (cid:19) { X j > u n } and (cid:98) γ Hill ( t ) ·· = 1 (cid:98) k n t (cid:99) (cid:98) k n t (cid:99) (cid:88) i =1 log (cid:32) X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− i +1 X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− k (cid:98) nt (cid:99) (cid:33) . Based on 2.6 the limiting distributions of (cid:98) γ (cid:98) nt (cid:99) and (cid:98) γ Hill ( t ) can be established in D [ t , t ∈ (0 , Corollary 2.7.
Let X j , j ∈ N , be a stationary time series with marginal tail distributionfunction ¯ F . Moreover, assume that Assumptions 2.1 and 2.3 hold.1. If nd n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , then nd n,r t (cid:0)(cid:98) γ (cid:98) nt (cid:99) − γ (cid:1) ⇒ in D [ t , for all t ∈ (0 , .2. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , then (cid:113) nF ( u n ) t (cid:0)(cid:98) γ (cid:98) nt (cid:99) − γ (cid:1) ⇒ (cid:90) ∞ s − B (cid:0) s − α , t (cid:1) ds − α − B (1 , t ) (13) in D [ t , for all t ∈ (0 , . Corollary 2.8.
Let X j , j ∈ N , be a stationary time series with marginal tail distributionfunction ¯ F . Moreover, assume that Assumptions 2.1 and 2.3 hold.1. If nd n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , then nd n,r t ( (cid:98) γ Hill ( t ) − γ ) ⇒ in D [ t , for all t ∈ (0 , .2. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , then (cid:113) nF ( u n ) t ( (cid:98) γ Hill ( t ) − γ ) ⇒ (cid:90) ∞ s − B (cid:0) s − α , t (cid:1) ds − α − B (1 , t ) (14) in D [ t , for all t ∈ (0 , . emark 2.9.
1. Following Kulik and Soulier (2011), we conjecture that the properscaling in the first case is a n = (cid:113) nF ( u n ), as well, yielding the same limit as inthe second case. However, within the scope of this article, we will not consider thecorresponding argument in detail.2. The limit in (13) and (14) corresponds to γB ( t ), t ∈ [0 , B is a standardBrownian motion. Recall that the considered test statistics for the change-point problem (
H, A ) are definedby Γ n ·· = sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ (cid:98) nt (cid:99) (cid:98) γ n − (cid:12)(cid:12)(cid:12)(cid:12) and (cid:101) Γ n ·· = sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ Hill ( t ) (cid:98) γ Hill (1) − (cid:12)(cid:12)(cid:12)(cid:12) . Using the convergence obtained in Corollaries 2.7 and 2.8, we derive the asymptoticdistribution of the test statistics.
Corollary 2.10.
Let X j , j ∈ N , be a stationary time series with marginal tail distribu-tion function ¯ F . Moreover, assume that Assumptions 2.1 and 2.3 hold. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , then, for all t ∈ (0 , , (cid:113) nF ( u n ) sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ (cid:98) nt (cid:99) (cid:98) γ n − (cid:12)(cid:12)(cid:12)(cid:12) ⇒ sup t ∈ [ t , | B ( t ) − tB (1) | , where B ( t ) , t ∈ [0 , , denotes a standard Brownian motion. Corollary 2.11.
Let X j , j ∈ N , be a stationary time series with marginal tail distribu-tion function ¯ F . Moreover, assume that Assumptions 2.1 and 2.3 hold. If (cid:113) nF ( u n ) = o (cid:16) nd n,r (cid:17) , then, for all t ∈ (0 , , (cid:113) nF ( u n ) sup t ∈ [ t , t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ Hill ( t ) (cid:98) γ Hill (1) − (cid:12)(cid:12)(cid:12)(cid:12) ⇒ sup t ∈ [ t , | B ( t ) − tB (1) | , where B ( t ) , t ∈ [0 , , denotes a standard Brownian motion. Simulations
For all simulations, the following specifications are made: X j = σ ( Y j ) ε j , j (cid:62) , (15)where • ε j , j (cid:62)
1, is an independent, identically distributed sequence of Pareto distributedrandom variables generated by the function rgpd ( fExtremes package in R ); • Y j , j (cid:62)
1, is a fractional Gaussian noise sequence generated by the function simFGN0 ( longmemo package in R ) with Hurst parameter H ; • σ ( y ) = exp( y ).Under the alternative, we insert a change of height h at location k = (cid:98) nτ (cid:99) by sim-ulating independent, identically Pareto distributed observations ε j , j (cid:62)
1, with ε j , j = 1 , . . . , k , having tail index α = . . . = α k = α and ε j , j = k + 1 , . . . , n , havingtail index α k +1 = . . . = α n = α + h .We base test decisions on the statistic (cid:101) Γ n ·· = max (cid:54) k (cid:54) n − Γ k,n , where (cid:101) Γ k,n = kn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) γ Hill (cid:0) kn (cid:1)(cid:98) γ Hill (1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) with (cid:98) γ Hill ( t ) = 1 (cid:98) k n t (cid:99) (cid:98) k n t (cid:99) (cid:88) i =1 log (cid:18) X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− i +1 X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:19) , (16)and we choose a significance level of 5%.For the computation of the test statistic, the choice of k n , i.e., the number of largestobservations that contribute to the estimation of the tail index, is considered a delicateissue. In fact, it has been shown in Hall (1982) that the optimal choice of k n dependson the tail behavior of the data-generating process. Due to this circularity, DuMouchel(1983) suggests to chose k n proportional to the sample size. As noted in Quintos et al.(2001), a corresponding choice of k n has been shown to perform well in simulations and iswidely used by practitioners. Hence, we choose k n = (cid:98) np (cid:99) , i.e., p defines the proportionof the data that the estimation of the tail index is based on.The power of the testing procedures is analyzed by considering different choices for theheight of the level shift, denoted by h , and the location of the change-point, denoted by τ . In the tables, the columns that are superscribed by h = 0 correspond to the frequencyof a type 1 error, i.e., the rejection rate under the hypothesis.16onsidering all simulation results, the first thing to note is that these concur withthe expected behavior of change-point tests: An increasing sample size goes along withan improvement of the finite sample performance, i.e., the empirical size approaches thelevel of significance and the empirical power increases, the empirical power of the testingprocedures increases when the height of the level shift increases, and the empirical power ishigher for breakpoints located in the middle of the sample than for change-point locationsthat lie close to the boundary of the testing region.Both Hurst parameter and tail index, seem to have a significant effect on the rejectionrates of the change-point test. An increase in dependence, i.e., an increase of the Hurstparameter H , leads to an increase in the number of rejections. On the one hand, thisleads to an increase of power, on the other hand, it results in a larger deviation of theempirical size from the significance level. An increase of tail thickness, i.e., a decreaseof the tail parameter α , however, results in an improvement of the test’s performance inthat the empirical power increases while the empirical size draws closer to the level ofsignificance. Moreover, the empirical power of the test is higher for changes to heaviertails, i.e., the test tends to detect changes with a negative change-point height h better.Technically speaking, the particular case of a change with height h = − α = 1to α = 0 does not fall under our model assumptions. For a change after a proportion τ = 0 . τ = 0 .
25, the empirical power is comparatively high.17 = 2 . α = 2 α = 1 p n h = 0 h = 0 . h = 1 h = − . h = − h = 0 h = 0 . h = 1 h = − . h = − h = 0 h = 0 . h = 1 h = − . h = − H = . H = . H = . H = . Table 1:
Rejection rates of the change-point test based on the statistic Γ n , k n = (cid:98) np (cid:99) , for LMSV time series (Pareto distributed ε j , j (cid:62)
1) of length n with Hurst parameter H , tail index α and a shift in the mean of height h after a proportion τ = 0 .
5. The calculations are based on 5,000 simulation runs. = 2 . α = 2 α = 1 p n h = 0 h = 0 . h = 1 h = − . h = − h = 0 h = 0 . h = 1 h = − . h = − h = 0 h = 0 . h = 1 h = − . h = − H = . H = . H = . H = . Table 2:
Rejection rates of the change-point test based on the statistic Γ n , k n = (cid:98) np (cid:99) , for LMSV time series (Pareto distributed ε j , j (cid:62)
1) of length n with Hurst parameter H , tail index α and a shift in the mean of height h after a proportion τ = 0 .
25. The calculations are based on 5,000 simulation runs.
Data
The analysis of financial time series, such as stock market prices, usually focuses on logreturns instead of the observed data itself. As an example, we consider the log returns ofthe daily closing indices of Standard & Poor’s 500 (S&P 500, in short) defined by L t ·· = log R t , R t ·· = P t P t − , where P t denotes the value of the index on day t , in the period from January 2008 toDecember 2008; see Figure 1. da il y c l o s i ng i nde x −0.10−0.050.000.050.10 Jan 2008 Apr 2008 Jul 2008 Okt 2008 Jan 2009 time l og − r e t u r n s o f da il y c l o s i ng i nde x Figure 1: Daily closing index of Standard & Poor’s 500 and its log returns from January2008 to December 2008. The data has been obtained from Google Finance.Comparing the plots of the sample autocorrelation function of the log returns andthe sample autocorrelation function of their absolute values in Figure 2, we observe aphenomenon that is often encountered in the context of financial data: the log returns ofthe index appear to be uncorrelated, whereas the absolute log returns tend to be highlycorrelated.Moreover, the plot in Figure 1 shows that the considered time series exhibits volatilityclustering , meaning that large price changes, i.e., log returns with relatively large absolutevalues, tend to cluster. This indicates that observations are not independent across time,although the absence of linear autocorrelation suggests that the dependence is nonlinear;see Cont (2005). 20 time l og − r e t u r n s time ab s o l u t e l og − r e t u r n s −0.250.000.250.500.751.00 0 5 10 15 20 25 lag a c f o f l og − r e t u r n s lag a c f o f ab s o l u t e l og − r e t u r n s Figure 2: Sample autocorrelation of the log returns and the absolute log returns ofStandard & Poor’s 500 daily closing index from January 2005 to December 2010. Thetwo dashed horizontal lines mark the bounds for the 95% confidence interval of theautocovariances under the assumption of data generated by white noise.Another characteristic of financial time series is the occurrence of heavy tails. Inparticular, probability distributions of log returns often exhibit tails which are heavierthan those of a normal distribution. For the S&P 500 data, this property is highlightedby the Q-Q plot in Figure 3.All of the previously described features of financial data can be covered by the LMSVmodel considered in our paper.In view of the fact that the LMSV model captures properties of the log returns ofStandard & Poor’s 500 daily closing index, we are interested in analyzing the data withrespect to a change in the tail index.As in our simulations, we base the test decision on the statistic defined in (16). Wechoose k n = (cid:98) np (cid:99) , i.e., p defines the proportion of the data that the estimation of thetail index is based on. Choosing p = 0 .
1, the value of the test statistic corresponds to (cid:101) Γ n = 1 . t ∈ [0 , | B ( t ) − tB (1) | equals1 . n therefore indicates a change-point in the tail index at a level of significance of 5%.A natural estimate for the change-point location is given by that point in time k ,21 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l −0.10−0.050.000.050.10 −3 −2 −1 0 1 2 3 normal quantiles s a m p l e quan t il e s Figure 3: Q-Q plot for the log returns of Standard & Poor’s 500 daily closing index fromJanuary 2005 to December 2010.where Γ k,n attains its maximum. For the considered data, this point in time correspondsto September 16, 2008, i.e., one day after September 15, 2008, the day Lehman Brothersfiled for bankruptcy protection; see Figure 4. −0.10−0.050.000.050.102008 Lehmann Brothers crashes 2009 time l og − r e t u r n s o f da il y c l o s i ng i nde x Figure 4: Log returns of the daily closing index of Standard & Poor’s 500 from Jan-uary 2008 to December 2008. The red dashed line indicates the estimated change-pointlocation.
Recall that e n ( s, t ) = (cid:110) (cid:101) T n ( s, t ) − T ( s, t ) (cid:111) , (cid:101) T n ( s, t ) ·· = 1 n ¯ F ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n s } and T ( s, t ) = ts − α . To prove Theorem 2.6, we consider the following decomposition: e n ( s, t ) = (cid:110) (cid:101) T n ( s, t ) − T n ( s, t ) (cid:111) + { T n ( s, t ) − T ( s, t ) } , where T n ( s, t ) ·· = E (cid:104) (cid:101) T n ( s, t ) (cid:105) = (cid:98) nt (cid:99) n F ( u n s ) F ( u n ) . Obviously, it holds that lim n →∞ T n ( s, t ) = T ( s, t )for s > t ∈ [0 , , ∞ ) × [0 , s > s (cid:62) s , t ∈ [0 , | T n ( s, t ) − T ( s, t ) | (cid:54) sup s (cid:62) s F ( u n s ) F ( u n ) sup t ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) nt (cid:99) n − t (cid:12)(cid:12)(cid:12)(cid:12) + sup s (cid:62) s (cid:12)(cid:12)(cid:12)(cid:12) F ( u n s ) F ( u n ) − s − α (cid:12)(cid:12)(cid:12)(cid:12) . Note that sup t ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) nt (cid:99) n − t (cid:12)(cid:12)(cid:12)(cid:12) = o d n,r n + 1 (cid:113) nF ( u n ) . Due to Proposition 2.8 in Kulik and Soulier (2011) and (TA.2) we havesup s (cid:62) s (cid:12)(cid:12)(cid:12)(cid:12) F ( u n s ) F ( u n ) − s − α (cid:12)(cid:12)(cid:12)(cid:12) = o d n,r n + 1 (cid:113) nF ( u n ) , which implies sup s (cid:62) s , t ∈ [0 , | T n ( s, t ) − T ( s, t ) | = o d n,r n + 1 (cid:113) nF ( u n ) . Since lim n →∞ F ε ( u n ) F ( u n ) = 1 E [ σ α ( Y )]by (TA.3) and Breiman’s Lemma, it therefore suffices to study weak convergence of theprocess (cid:101) e n ( s, t ) = 1 nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 (cid:0) { X j > u n s } − F ( u n s ) (cid:1) . (cid:101) e n ( s, t ) = ·· M n ( s, t ) + R n ( s, t ) , (17)where M n ( s, t ) ·· = 1 nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 ( { X j > u n s } − E [ { X j > u n s } | F j − ]) ,R n ( s, t ) ·· = 1 nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 (cid:0) E [ { X j > u n s } | F j − ) − F ( u n s ) (cid:3) , and F j ·· = σ ( ε k , η k , k ∈ Z , , k (cid:54) j ) . (18)We call M n the martingale part , while we refer to R n as the long-range dependent part . (Weak convergence of R n ( s, t )) . Under the assumptions of Theorem2.6, the following holds: nd n,r R n ( s, t ) ⇒ s − α r ! J r (Ψ) Z r,H ( t ) , where ⇒ denotes weak convergence in D ([1 , ∞ ] × [0 , .Proof. Note that E [ { X j > u n s } | F j − ] = F ε (cid:18) u n sσ ( Y j ) (cid:19) (19)and E (cid:20) F ε (cid:18) u n sσ ( Y j ) (cid:19)(cid:21) = E [ E [ { X j > u n s } | F j − ]] = E [ { X j > u n s } ] = F ( u n s ) . As a result, we can rewrite R n ( s, t ) as follows: R n ( s, t ) = 1 nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 (cid:18) F ε (cid:18) u n sσ ( Y j ) (cid:19) − E (cid:20) F ε (cid:18) u n sσ ( Y j ) (cid:19)(cid:21)(cid:19) = 1 n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ n ( Y j , s ) − E [Ψ n ( Y j , s )]) , where Ψ n ( y, s ) = F ε (cid:16) u n sσ ( y ) (cid:17) F ε ( u n ) . (20)24ue to regular variation of F ε , we haveΨ n ( y, s ) = F ε (cid:16) u n sσ ( y ) (cid:17) F ε ( u n ) ∼ (cid:18) sσ ( y ) (cid:19) − α = s − α Ψ( y ) . (21)Furthermore, it holds that R n ( s, t ) = 1 n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ n ( Y j , s ) − E [Ψ n ( Y j , s )])= 1 n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ s ( Y j ) − E [Ψ s ( Y j )]) + 1 n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ n ( Y j , s ) − Ψ s ( Y j ))+ 1 n (cid:98) nt (cid:99) (cid:88) j =1 ( E [Ψ s ( Y j )] − E [Ψ n ( Y j , s )]) , (22)where Ψ s ( y ) ·· = s − α Ψ( y ).As E [ σ α ( Y )] < ∞ by (TA.3), the functional non-central limit theorem of Taqqu(1979) yields nd n,r n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ s ( Y j ) − E [Ψ s ( Y j )]) = s − α nd n,r n (cid:98) nt (cid:99) (cid:88) j =1 (Ψ( Y j ) − E [Ψ( Y j )]) ⇒ s − α r ! J r (Ψ) Z r,H ( t ) . In the following, we will see that the first and the second summand in (22) are negli-gible. For this, it suffices to show thatlim n →∞ (cid:98) nt (cid:99) d n,r E (cid:2)(cid:12)(cid:12) Ψ n ( Y , s ) − s − α Ψ( Y ) (cid:12)(cid:12)(cid:3) = 0 . Note that E (cid:2)(cid:12)(cid:12) Ψ n ( Y , s ) − s − α Ψ( Y ) (cid:12)(cid:12)(cid:3) = (cid:90) R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ε (cid:16) u n sσ ( y ) (cid:17) F ε ( u n ) − (cid:18) sσ ( y ) (cid:19) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ϕ ( y ) dy. Due to second order regular variation of F ε , Lemma 6.1 implies that for any ε > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ε (cid:16) u n sσ ( y ) (cid:17) F ε ( u n ) − (cid:18) sσ ( y ) (cid:19) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) Cη ∗ ( u n ) (cid:18) sσ ( y ) (cid:19) − ρ − α (cid:18) max (cid:26) sσ ( y ) , σ ( y ) s (cid:27)(cid:19) ε . Thus, it follows thatsup s (cid:62) s E (cid:2)(cid:12)(cid:12) Ψ n ( Y , s ) − s − α Ψ( Y ) (cid:12)(cid:12)(cid:3) (cid:54) Cη ∗ ( u n ) s − α − ρ − ε (cid:90) R σ α + ρ + ε ( y ) ϕ ( y ) dy. By (TA.2) and (TA.3), i.e., since η ∗ ( u n ) = o ( d n,r /n ) and E [ σ α + ρ + ε ( Y )] < ∞ , it holdsthat sup s (cid:62) s E (cid:2)(cid:12)(cid:12) Ψ n ( Y , s ) − s − α Ψ( Y ) (cid:12)(cid:12)(cid:3) = o (cid:18) d n,r n (cid:19) . This completes the proof of Proposition 5.1.25 .1.3 The martingale part
The goal of this section is to prove the following proposition:
Proposition 5.2.
Under the assumptions of Theorem 2.6, for any
R > , the se-quence (cid:113) nF ε ( u n ) M n ( s, t ) , n (cid:62) , converges in distribution to (cid:112) E [ σ α ( Y )] B ( s − α , t ) in D ([1 , R ] × [0 , , where B denotes a standard Brownian sheet. First, we establish convergence of the finite dimensional distributions. Then, weproceed with a proof of tightness. For the latter, we use chaining arguments; see Sec-tion 5.1.3).
The martingale part: Convergence of the finite dimensional distributions
We have to prove that for all positive integers d and d , all 1 (cid:54) s d < · · · < s with s i ∈ R and all 0 (cid:54) t < · · · < t d (cid:54)
1, the vector with entries (cid:113) nF ε ( u n ) M n ( s i , t j ),1 (cid:54) i (cid:54) d , 1 (cid:54) j (cid:54) d , converges in distribution to (cid:112) E [ σ α ( Y )] B ( s − αi , t j ), 1 (cid:54) i (cid:54) d ,1 (cid:54) j (cid:54) d . For this, it suffices to consider the case d = d . Indeed, if d < d ,we include s i , d + 1 (cid:54) i (cid:54) d , in a decreasing order between s d and s d − , i.e., suchthat 1 (cid:54) s d < s d < · · · < s d +1 < s d − < · · · < s , and if d < d , we include t j , d + 1 (cid:54) j (cid:54) d between t d − and t d in an increasing order, i.e., t d − < t d +1 < · · · 1. Letting d = max { d , d } , the convergence in distribution of the vector withentries (cid:113) nF ε ( u n ) M n ( s i , t j ), 1 (cid:54) i (cid:54) d , 1 (cid:54) j (cid:54) d , to (cid:112) E [ σ α ( Y )] B ( s − αi , t j ), 1 (cid:54) i (cid:54) d ,1 (cid:54) j (cid:54) d , implies convergence of (cid:113) nF ε ( u n ) M n ( s i , t j ), 1 (cid:54) i (cid:54) d , 1 (cid:54) j (cid:54) d , to (cid:112) E [ σ α ( Y )] B ( s − αi , t j ), 1 (cid:54) i (cid:54) d , 1 (cid:54) j (cid:54) d .Let d (cid:62) s , . . . , s d and t , . . . , t d be real numbers such that 1 (cid:54) s d < · · · < s (cid:54) R and 0 = ·· t (cid:54) t < · · · < t d (cid:54) 1. Define t intervals I n, ·· = ( s u n , ∞ ),and for 2 (cid:54) i (cid:54) d , let I n,i ·· = ( s i u n , s i − u n ]. Moreover, define random variables Z ni,j ·· = 1 (cid:113) nF ε ( u n ) (cid:98) nt j (cid:99) (cid:88) k = (cid:98) nt j − (cid:99) +1 ( { X k ∈ I n,i } − E [ { X k ∈ I n,i } | F k − ])for 1 (cid:54) i, j (cid:54) d . Lemma 5.3. The sequence of random vectors (cid:0) Z ni,j (cid:1) (cid:54) i,j (cid:54) d , n (cid:62) , converges in distribu-tion to ( Z i,j ) (cid:54) i,j (cid:54) d , where the random variables Z i,j , (cid:54) i, j (cid:54) d , are independent, andfor all i, j ∈ { , . . . , d } , the random variable Z i,j is Gaussian, centered, and has variance E [ σ α ( Y )] ( t j − t j − ) (cid:0) s − αi − s − αi − (cid:1) , where s − α = 0 . x i,j ) di,j =1 (cid:55)→ (cid:32) i (cid:88) i (cid:48) =1 j (cid:88) j (cid:48) =1 x i (cid:48) ,j (cid:48) (cid:33) di,j =1 . Proof of Lemma 5.3. By the Cram´er-Wold device, we have to prove that for each collec-tion of real numbers a i,j , 1 (cid:54) i, j (cid:54) d , the sequence (cid:80) di,j =1 a i,j Z ni,j , n (cid:62) 1, converges to anormal distribution with mean zero and variance σ ·· = E [ σ α ( Y )] d (cid:88) i,j =1 ( t j − t j − ) (cid:0) s − αi − s − αi − (cid:1) a i,j . (23)We will prove this by applying the following central limit theorem for martingaledifference arrays. For this, recall that ∆ n,k , n (cid:62) 1, 1 (cid:54) k (cid:54) n , is a martingale differencearray with respect to the filtration F n,k , n (cid:62) 1, 0 (cid:54) k (cid:54) n , if for all n , F n,k ⊂ F n,k +1 ,1 (cid:54) k (cid:54) n − 1, ∆ n,k is F n,k -measurable and E [∆ n,k | F n,k − ] = 0. Theorem 5.4 (Theorem VIII. 1 in Pollard (1984)) . Let (∆ n,k ) n (cid:62) , (cid:54) k (cid:54) n be a martingaledifference array with respect to the filtration ( F n,k ) n (cid:62) , (cid:54) k (cid:54) n , such that ∆ n,k is squareintegrable for all n and k . Moreover, assume that1. for each positive (cid:15) , n (cid:88) k =1 E (cid:2) ∆ n,k {| ∆ n,k | > (cid:15) } | F n,k − (cid:3) → in probability ; (24) n (cid:88) k =1 E (cid:2) ∆ n,k | F n,k − (cid:3) → σ in probability . (25) Then, the sequence (cid:80) nk =1 ∆ n,k , n (cid:62) , converges in distribution to a normal law withmean zero and variance σ defined in (23) . We express (cid:101) Z n ·· = (cid:80) di,j =1 a i,j Z ni,j as a sum of martingale differences. For 1 (cid:54) i (cid:54) d ,define D n,i,k ·· = 1 (cid:113) nF ε ( u n ) ( { X k ∈ I n,i } − E [ { X k ∈ I n,i } | F k − ]) . If for some j ∈ { , . . . , d } the integer k satisfies (cid:98) nt j (cid:99) + 1 (cid:54) k (cid:54) (cid:98) nt j +1 (cid:99) , then we define∆ n,k ·· = d (cid:88) i =1 a i,j D n,i,k , (cid:98) nt d (cid:99) + 1 (cid:54) k (cid:54) n , we define ∆ n,k ·· = 0.In this way, (cid:101) Z n = (cid:80) nk =1 ∆ n,k and defining F n,j as the σ -algebra F j given by (18), thearray ∆ n,k , n (cid:62) 1, 1 (cid:54) k (cid:54) n , is a square integrable martingale difference array withrespect to the filtration F n,k , n (cid:62) 1, 0 (cid:54) k (cid:54) n . Let us check (24). Observe that for all1 (cid:54) k (cid:54) n and 1 (cid:54) i (cid:54) d , | D n,i,k | (cid:54) (cid:0) nF ε ( u n ) (cid:1) − / . Hence, for all 1 (cid:54) k (cid:54) n , | ∆ n,k | (cid:54) d (cid:88) i,j =1 | a i,j | (cid:0) nF ε ( u n ) (cid:1) − / . Consequently, for a fixed (cid:15) , when n is such that 2 (cid:80) di,j =1 | a i,j | (cid:0) nF ε ( u n ) (cid:1) − / < (cid:15) , theindicator {| ∆ n,k | > (cid:15) } vanishes and hence (24) holds.Let us check (25). It suffices to prove that for all j ∈ { , . . . , d } E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) nt j (cid:99) (cid:88) k = (cid:98) nt j − (cid:99) +1 E (cid:2) ∆ n.k | F k − (cid:3) − E [ σ α ( Y )] ( t j − t j − ) d (cid:88) i =1 a i,j (cid:0) s − αi − s − αi − (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → . (26)To this aim, we decompose E [∆ n.k | F k − ]: E (cid:2) ∆ n.k | F k − (cid:3) = d (cid:88) i =1 a i,j E (cid:2) D n,i,k | F k − (cid:3) +2 (cid:88) (cid:54) i
0, where R n, ,i ·· = 1 nF ε ( u n ) (cid:98) nt j (cid:99)−(cid:98) nt j − (cid:99) (cid:88) k =1 (cid:18) F ε (cid:18) u n s i σ ( Y k ) (cid:19) − F ε (cid:18) u n s i − σ ( Y k ) (cid:19)(cid:19) − E [ σ α ( Y )] ( t j − t j − ) (cid:0) s − αi − s − αi − (cid:1) . Applying Lemma 6.2 with t = u n , a = s i /σ ( Y k ), b = s i − /σ ( Y k ) and (cid:15) = 1, we get | R n, ,i | (cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:98) nt j (cid:99)−(cid:98) nt j − (cid:99) (cid:88) k =1 (cid:18) σ α ( Y k ) s αi − σ α ( Y k ) s αi − (cid:19) − ( t j − t j − ) (cid:0) s − αi − s − αi − (cid:1) E [ σ α ( Y )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + Cη ∗ ( u n ) 1 n (cid:98) nt j (cid:99)−(cid:98) nt j − (cid:99) (cid:88) k =1 (cid:18) max (cid:26) σ ( Y k ) s i , (cid:27)(cid:19) α + ρ +1 s i − s i − σ ( Y k ) . By (TA.3) and (TA.4) we have E (cid:34) (max { σ ( Y ) , } ) α + ρ +1 σ ( Y ) (cid:35) < ∞ . Using the ergodic theorem for the first term, the fact that η ∗ ( u n ) → E [ | R n, | ] → n goes to infinity.The treatment of R n, and R n, is the same: we take expectations and bound one ofthe factors F ε (cid:16) u n s i σ ( Y k ) (cid:17) − F ε (cid:16) u n s i − σ ( Y k ) (cid:17) using (43) in Lemma 6.2 when i = 1 and (45) inLemma 6.2 when 2 (cid:54) i (cid:54) d . We then conclude by the dominated convergence theorem.This finishes the proof of Lemma 5.3. The martingale part: TightnessLemma 5.5. Under the assumptions of Theorem 2.6, for any R > , the sequence (cid:113) nF ε ( u n ) M n ( s, t ) , n (cid:62) , is tight in D ([1 , R ] × [0 , . roof. Define m n ( s, t ) ·· = (cid:113) nF ε ( u n ) M n ( s, t ) . In order to prove tightness of m n ( s, t ), we validate the following tightness criterion: forall (cid:15) > δ → lim sup n →∞ P sup | s − s | <δ (cid:54) s ,s (cid:54) R sup | t − t | <δ (cid:54) t ,t (cid:54) | m n ( s , t ) − m n ( s , t ) | > (cid:15) = 0 . Writing m n ( s , t ) − m n ( s , t ) = m n ( s , t ) − m n ( s , t ) + m n ( s , t ) − m n ( s , t ) , it suffices to showlim δ → lim sup n →∞ P sup | s − s | <δ (cid:54) s ,s (cid:54) R sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | > (cid:15) = 0 (27)and lim δ → lim sup n →∞ P sup (cid:54) s (cid:54) R sup | t − t | <δ (cid:54) t ,t (cid:54) | m n ( s, t ) − m n ( s, t ) | > (cid:15) = 0 . (28) (27) . In order to prove (27), we apply a chaining technique.For this, we define the intervals I ,k ·· = [1 + 2 kδ, k + 1) δ ] and I ,k ·· = [1 + (2 k + 1) δ, k + 1) + 1) δ ]for k = 0 , . . . , L δ ·· = (cid:98) R − δ (cid:99) . Then, the expression inside P in (27) is bounded bymax (cid:54) k (cid:54) L δ sup s ,s ∈ I ,k sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | + max (cid:54) k (cid:54) L δ sup s ,s ∈ I ,k sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | . (29)In the following, we consider the first summand only, since for the second summandanalogous considerations hold, i.e., it remains to show thatlim δ → lim sup n →∞ P (cid:32) max (cid:54) k (cid:54) L δ sup s ,s ∈ I ,k sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | > (cid:15) (cid:33) = 0 . δ → lim sup n →∞ δ max (cid:54) k (cid:54) L δ P (cid:32) sup s ,s ∈ I ,k sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | > (cid:15) (cid:33) = 0 . We write I ,k = [ a k , a k +1 ], i.e., a k ·· = 1 + 2 kδ and a k +1 ·· = 1 + 2( k + 1) δ . Note thatsup s ,s ∈ I ,k sup t ∈ [0 , | m n ( s , t ) − m n ( s , t ) | (cid:54) x ∈ [0 , δ ] sup t ∈ [0 , | m n ( a k , t ) − m n ( a k + x, t ) | . Define refining partitions x i ( k ) for k = 0 , . . . , K n with K n → ∞ , for n → ∞ , by x i ( k ) ·· = sup (cid:26) x ∈ [1 , R ] | F ε ( u n x ) F ε ( u n ) (cid:62) i k δ (cid:27) , i = 0 , . . . , (cid:98) k δ − (cid:99) , and choose i k ( x ) such that a k + x ∈ (cid:0) x i k ( x )+1 ( k ) , x i k ( x ) ( k ) (cid:3) . From the definition of m n we obtain | m n ( y, t ) − m n ( x, t ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:113) nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 ( { u n x < X j (cid:54) u n y } − E [ { u n x < X j (cid:54) u n y } |F j − ]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Then, with m n ( x ) ·· = sup t ∈ [0 , | m n ( x, t ) | and m n ( x, y ) ·· = sup t ∈ [0 , | m n ( y, t ) − m n ( x, t ) | ,it follows thatsup t ∈ [0 , | m n ( a k , t ) − m n ( a k + x, t ) | (cid:54) m n ( a k , x i Kn ( x )+1 ( K n )) + K n (cid:88) k =1 m n ( x i k ( x )+1 ( k ) , x i k − ( x )+1 ( k − m n ( x i ( x )+1 (0) , a k + x ) . As a result, we have P (cid:32) sup x ∈ [0 , δ ] sup t ∈ [0 , | m n ( a k , t ) − m n ( a k + x, t ) | > (cid:15) (cid:33) (cid:54) P (cid:32) sup x ∈ [0 , δ ] m n ( a k , x i Kn ( x )+1 ( K n )) > (cid:15) (cid:33) + K n (cid:88) k =1 P (cid:32) sup x ∈ [0 , δ ] m n ( x i k ( x )+1 ( k ) , x i k − ( x )+1 ( k − > (cid:15) ( k + 3) (cid:33) + P (cid:32) sup x ∈ [0 , δ ] m n ( x i ( x )+1 (0) , a k + x ) > (cid:15) − ∞ (cid:88) k =0 (cid:15) ( k + 3) − (cid:15) (cid:33) . (30)Since ∞ (cid:88) k =0 (cid:15) ( k + 3) (cid:54) (cid:15) , 31t follows that P (cid:32) sup x ∈ [0 , δ ] m n ( x i ( x )+1 (0) , a k + x ) > (cid:15) − ∞ (cid:88) k =0 (cid:15) ( k + 3) − (cid:15) (cid:33) (cid:54) P (cid:32) sup x ∈ [0 , δ ] m n ( x i ( x )+1 (0) , a k + x ) > (cid:15) (cid:33) . Additionally, we conclude that K n (cid:88) k =1 P (cid:32) sup x ∈ [0 , δ ] m n ( x i k ( x )+1 ( k ) , x i k − ( x )+1 ( k − > (cid:15) ( k + 3) (cid:33) (cid:54) K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 P (cid:18) m n ( x i +1 ( k ) , x i ( k )) > (cid:15) ( k + 3) (cid:19) . For further estimation, we use Freedman’s inequality: if ( d j , F j ), j (cid:62) 1, is a martingaledifference sequence such that sup j | d j | (cid:54) c , where c is a positive constant, then for all x, y > P (cid:32)(cid:40) max (cid:54) (cid:96) (cid:54) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96) (cid:88) j =1 d j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > x (cid:41) ∩ (cid:40) n (cid:88) j =1 E (cid:2) d j | F j − (cid:3) (cid:54) y (cid:41)(cid:33) (cid:54) (cid:32) − x (cid:0) y + cx (cid:1) (cid:33) (cid:54) (cid:26) exp (cid:18) − x y (cid:19) , exp (cid:18) − x/c (cid:19)(cid:27) ;see Theorem 1.6 in Freedman (1975).For this purpose, we define d j,i,k,n ·· = { u n x i +1 ( k ) < X j (cid:54) u n x i ( k ) } − E [ { u n x i +1 ( k ) < X j (cid:54) u n x i ( k ) } | F j − ] . Then, it follows that m n ( x i +1 ( k ) , x i ( k )) = max (cid:54) (cid:96) (cid:54) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:113) nF ε ( u n ) (cid:96) (cid:88) j =1 d j,i,k,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Furthermore, for B > 0, which is chosen later, define y n,k,i ·· = nF ε ( u n ) B (cid:18) F ε ( u n x i +1 ( k )) F ε ( u n ) − F ε ( u n x i ( k )) F ε ( u n ) (cid:19) and z n,k ·· = (cid:15) (cid:113) nF ε ( u n )( k + 3) . Since F ε is continuous and since we can assume without loss of generality that F ε isultimately strictly decreasing, it holds that F ε ( u n x i +1 ( k )) F ε ( u n ) − F ε ( u n x i ( k )) F ε ( u n ) = δ k , y n,k,i = nF ε ( u n ) B δ k . For an estimation by Freedman’s inequality, we have to specify K n . Choosing K n ·· = (cid:98) log ( δa n C ) (cid:99) for some constant C , where a n , n (cid:62) 1, is a sequence with a n → ∞ and a n = o (cid:18) nd n,r + (cid:113) nF ε ( u n ) (cid:19) , it follows (with c = 2) that z n,k y n,k,i = (cid:15) k B ( k + 3) δ (cid:54) (cid:15) C (cid:113) nF ε ( u n )4 B ( k + 3) (cid:54) (cid:15) (cid:113) nF ε ( u n )2( k + 3) = 38 z n,k c for a corresponding choice of C . Therefore, we havemax (cid:26) exp (cid:18) − z n,k y n,k,i (cid:19) , exp (cid:18) − z n,k /c (cid:19)(cid:27) = exp (cid:18) − z n,k y n,k,i (cid:19) . As a result, an application of Freedman’s inequality yields K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 P (cid:32) m n ( x i +1 ( k ) , x i ( k )) > (cid:15) ( k + 3) , n (cid:88) j =1 E (cid:2) d j,i,k,n | F j − (cid:3) (cid:54) y n,k,i (cid:33) (cid:54) K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 P max (cid:54) (cid:96) (cid:54) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96) (cid:88) j =1 d j,i,k,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) (cid:113) nF ε ( u n )( k + 3) , n (cid:88) j =1 E (cid:2) d j,i,k,n | F j − (cid:3) (cid:54) y n,k,i (cid:54) K n (cid:88) k =1 (cid:0) (cid:98) k δ − (cid:99) + 1 (cid:1) exp (cid:18) − (cid:15) k + 3) k Bδ (cid:19) . Noting that there exists a constant D > k (cid:0) (cid:98) k δ − (cid:99) + 1 (cid:1) exp (cid:18) − (cid:15) k + 3) δ k B (cid:19) (cid:54) (cid:98) k δ − (cid:99) exp (cid:32) − D (cid:15) k Bδ (cid:33) , elementary calculations yield K n (cid:88) k =1 (cid:0) (cid:98) k δ − (cid:99) + 1 (cid:1) exp (cid:18) − (cid:15) k + 3) k Bδ (cid:19) (cid:54) B(cid:15) log(2) (cid:18) exp (cid:18) − D (cid:15) Bδ (cid:19) − exp (cid:18) − D (cid:15) Bδ Kn (cid:19)(cid:19) . It follows thatlim δ → lim n →∞ δ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 P (cid:32) max (cid:54) (cid:96) (cid:54) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96) (cid:88) j =1 d j,i,k,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > z n,k , n (cid:88) j =1 E (cid:2) d j,i,k,n | F j − (cid:3) (cid:54) y n,k,i (cid:33) = 0 . Therefore, in order to finish the proof of (27), it remains to show thatlim δ → lim n →∞ δ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 P (cid:32) n (cid:88) j =1 E (cid:2) d j,i,k,n | F j − (cid:3) > y n,k,i (cid:33) = 0 . A and a σ -algebra F E (cid:2) ( ( A ) − E [ ( A ) | F ]) | F (cid:3) = E [ ( A ) | F ] − ( E [ ( A ) | F ]) (cid:54) E [ ( A ) | F ] , it holds that E (cid:2) d j,i,k,n | F j − (cid:3) (cid:54) F ε (cid:18) u n x i +1 ( k ) σ ( Y j ) (cid:19) − F ε (cid:18) u n x i ( k ) σ ( Y j ) (cid:19) . Therefore, we arrive at P (cid:32) n (cid:88) j =1 E (cid:2) d j,i,k,n | F j − (cid:3) > y n,k,i (cid:33) (cid:54) P (cid:32) y n,k,i n (cid:88) j =1 (cid:18) F ε (cid:18) u n x i +1 ( k ) σ ( Y j ) (cid:19) − F ε (cid:18) u n x i ( k ) σ ( Y j ) (cid:19)(cid:19) > (cid:33) . Note that1 y n,k,i n (cid:88) j =1 (cid:18) F ε (cid:18) u n x i +1 ( k ) σ ( Y j ) (cid:19) − F ε (cid:18) u n x i ( k ) σ ( Y j ) (cid:19)(cid:19) = A ( n, k, i ) + A ( n, k, i ) + A ( n, k, i ) , (31)where A ( n, k, i ) ·· = F ε ( u n ) y n,k,i n (cid:88) j =1 F ε (cid:16) u n x i +1 ( k ) σ ( Y j ) (cid:17) F ε ( u n ) − F ε (cid:16) u n x i ( k ) σ ( Y j ) (cid:17) F ε ( u n ) − σ α ( Y j ) (cid:0) x i +1 ( k ) − α − x i ( k ) − α (cid:1) ,A ( n, k, i ) ·· = F ε ( u n ) y n,k,i n (cid:88) j =1 σ α ( Y j ) (cid:26)(cid:0) x i +1 ( k ) − α − x i ( k ) − α (cid:1) − (cid:18) F ε ( u n x i +1 ( k )) F ε ( u n ) − F ε ( u n x i ( k )) F ε ( u n ) (cid:19)(cid:27) ,A ( n, k, i ) ·· = F ε ( u n ) y n,k,i n (cid:88) j =1 σ α ( Y j ) (cid:18) F ε ( u n x i +1 ( k )) F ε ( u n ) − F ε ( u n x i ( k )) F ε ( u n ) (cid:19) , so that P (cid:32) y n,k,i n (cid:88) j =1 (cid:20) F ε (cid:18) u n x i +1 ( k ) σ ( Y j ) (cid:19) − F ε (cid:18) u n x i ( k ) σ ( Y j ) (cid:19)(cid:21) > (cid:33) (cid:54) P (cid:18) | A ( n, k, i ) | > (cid:19) + P (cid:18) | A ( n, k, i ) | > (cid:19) + P (cid:18) | A ( n, k, i ) | > (cid:19) . (32)According to Lemma 6.2 in the appendix it holds that | A ( n, k, i ) | (cid:54) k F ε ( u n ) nF ε ( u n ) Bδ O ( η (cid:63) ( u n ))( x i +1 ( k ) − x i ( k )) n (cid:88) j =1 σ α ( Y j ) . Therefore, given Assumption (TA.2), it follows thatlim n →∞ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 δ P (cid:18) | A ( n, k, i ) | > (cid:19) (cid:54) lim n →∞ O ( η (cid:63) ( u n )) 1 δ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 k Bδ ( x i +1 ( k ) − x i ( k )) E [ σ α ( Y )]= lim n →∞ O ( η (cid:63) ( u n )) O (cid:32) δ K n (cid:88) k =1 k Bδ (cid:33) = lim n →∞ O ( η (cid:63) ( u n )) O (cid:18) δ K n Bδ (cid:19) = 0 . | A ( n, k, i ) | (cid:54) F ε ( u n ) y n,k,i n (cid:88) j =1 Cη (cid:63) ( u n ) (cid:18) x i +1 ( k ) σ ( Y j ) − x i ( k ) σ ( Y j ) (cid:19) (cid:18) min (cid:26) x i +1 ( k ) σ ( Y j ) , (cid:27)(cid:19) − α − ρ − (cid:15) (cid:54) F ε ( u n ) nF ε ( u n ) B δ k Cη (cid:63) ( u n )( x i +1 ( k ) − x i ( k )) n (cid:88) j =1 σ − ( Y j ) (max { σ ( Y j ) , } ) α + ρ + (cid:15) = 2 k Bδ Cη (cid:63) ( u n )( x i +1 ( k ) − x i ( k )) 1 n n (cid:88) j =1 σ − ( Y j ) (max { σ ( Y j ) , } ) α + ρ + (cid:15) . According to Assumptions (TA.3) and (TA.4) the expectation of the summands on theright-hand side is finite and hence, for each δ > n →∞ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 δ P (cid:18) | A ( n, k, i ) | > (cid:19) (cid:54) lim n →∞ δ K n (cid:88) k =1 (cid:98) k δ − (cid:99) (cid:88) i =0 k nBδ O ( η (cid:63) ( u n ))( x i +1 ( k ) − x i ( k )) n (cid:88) j =1 E (cid:2) σ − ( Y j ) (max { σ ( Y j ) , } ) α + ρ + (cid:15) (cid:3) = lim n →∞ O ( η (cid:63) ( u n )) O (cid:32) δ K n (cid:88) k =1 k Bδ (cid:33) = lim n →∞ O ( η (cid:63) ( u n )) O (cid:18) δ K n Bδ (cid:19) = lim n →∞ O ( η (cid:63) ( u n )) O (cid:16) a n Bδ (cid:17) = 0 . Finally, we consider the last summand in (31): A ( n, k, i ) = 1 nB n (cid:88) j =1 σ α ( Y j ) . Obviously, A ( n, k, i ) depends neither on k nor on i . Due to the non-central limit theoremin Taqqu (1979), it holds that1 nB n (cid:88) j =1 ( σ α ( Y j ) − E [ σ α ( Y )]) = O P (cid:18) d n,r n (cid:19) . Choosing B > E [ σ α ( Y )], it follows that P (cid:32) nB n (cid:88) j =1 σ α ( Y j ) > (cid:33) (cid:54) P (cid:32) nB n (cid:88) j =1 ( σ α ( Y j ) − E [ σ α ( Y )]) > (cid:33) = O (cid:18) d n,r n (cid:19) . Hence, we have lim n →∞ K n (cid:88) k =1 kδ − (cid:88) i =0 δ P (cid:18) | A ( n, k, i ) | > (cid:19) = lim n →∞ P (cid:32) nB n (cid:88) j =1 σ α ( Y j ) > (cid:33) K n (cid:88) k =1 kδ − (cid:88) i =0 δ = lim n →∞ O (cid:18) d n,r n (cid:19) K n (cid:88) k =1 k δ = lim n →∞ O (cid:18) d n,r n K n δ (cid:19) = 0 . (28) . Initially, note thatsup | t − t | <δ (cid:54) t ,t (cid:54) | m n ( s, t ) − m n ( s, t ) | = sup | t − t | <δ (cid:54) t ,t (cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:113) nF ε ( u n ) (cid:98) nt (cid:99) (cid:88) j = (cid:98) nt (cid:99) +1 ( { X j > u n s } − E [ { X j > u n s } | F j − ]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . As before, we apply a chaining technique in order to prove (28). For this, we definethe intervals I ,k ·· = [2 kδ, k + 1) δ ] and I ,k ·· = [(2 k + 1) δ, (2( k + 1) + 1) δ ] . for k = 0 , . . . , L δ ·· = (cid:98) δ (cid:99) .Then, similarly to (29), it holds thatsup (cid:54) s (cid:54) R sup | t − t | <δ (cid:54) t ,t (cid:54) | m n ( s, t ) − m n ( s, t ) | (cid:54) sup (cid:54) s (cid:54) R max (cid:54) k (cid:54) L δ sup t ,t ∈ I ,k | m n ( s, t ) − m n ( s, t ) | + sup (cid:54) s (cid:54) R max (cid:54) k (cid:54) L δ sup t ,t ∈ I ,k | m n ( s, t ) − m n ( s, t ) | . Again, we restrict our considerations to the first summand and we note that it suffices toshow that lim δ → lim sup n →∞ δ P (cid:32) sup (cid:54) s (cid:54) R sup t ,t ∈ I ,k | m n ( s, t ) − m n ( s, t ) | > (cid:15) (cid:33) = 0 . Since, due to stationarity of the data-generating process,sup (cid:54) s (cid:54) R sup t ,t ∈ I ,k | m n ( s, t ) − m n ( s, t ) | D = sup (cid:54) s (cid:54) R sup t ,t ∈ I , | m n ( s, t ) − m n ( s, t ) | , verification of (28) follows by the same argument as verification (27). Proof of Corollary 2.7. An argument from Kulik and Soulier (2011) is repeated and hencemany technicalities are omitted. Note that the arguments below are model-free and onlyuse the conclusion of Theorem 2.6. 36t holds that (cid:98) γ (cid:98) nt (cid:99) = A n ( t ) B n ( t ) , where A n ( t ) ·· = 1 nF ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 log (cid:18) X j u n (cid:19) { X j > u n } and B n ( t ) ·· = 1 nF ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n } . By substracting and adding tα − /B n ( t ), the following equality holds: ta n (cid:0)(cid:98) γ (cid:98) nt (cid:99) − α − (cid:1) = a n tB n ( t ) (cid:0) A n ( t ) − tα − (cid:1) + a n tα (cid:18) tB n ( t ) − (cid:19) , where a n = n/d n,r if n/d n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) and a n = (cid:113) nF ( u n ) if (cid:113) nF ( u n ) = o ( n/d n,r ).We note that, compared to Kulik and Soulier (2011), the last term appears addition-ally due to the fact that we consider the two-parameter processes.Rewriting A n ( t ) − tα − as an integral and replacing (cid:102) T n − T by e n , we have t (cid:0)(cid:98) γ (cid:98) nt (cid:99) − α − (cid:1) = a n tB n ( t ) (cid:90) ∞ s − e n ( s, t ) ds − a n tαB n ( t ) e n (1 , t ) . (33)We will show weak convergence of the sequence Y ( R ) n ( t ) := a n tB n ( t ) (cid:90) R s − e n ( s, t ) ds − a n tαB n ( t ) e n (1 , t ) , n (cid:62) , (34)in D [ t , 1] by an application of the continuous mapping theorem. For this, we have toinitially show that terms of the form t/B n ( t ) are negligible. Noting that B n ( t ) = (cid:101) T n (1 , t ),it follows by Theorem 2.6 that sup t (cid:54) t (cid:54) (cid:12)(cid:12)(cid:12)(cid:12) t B n (1) B n ( t ) − (cid:12)(cid:12)(cid:12)(cid:12) P −→ . (35)Indeed, Theorem 2.6 implies that sup t (cid:54) t (cid:54) | B n ( t ) − t | → t (cid:54) t (cid:54) (cid:12)(cid:12)(cid:12)(cid:12) t B n (1) B n ( t ) − (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) B n ( t ) sup t (cid:54) t (cid:54) | tB n (1) − B n ( t ) | (cid:54) B n ( t ) sup t (cid:54) t (cid:54) | t ( B n (1) − 1) + t − B n ( t ) | (cid:54) B n ( t ) sup t (cid:54) t (cid:54) | B n ( t ) − t | → . As a consequence, rewriting Y ( R ) n as Y ( R ) n ( t ) = a n (cid:90) R s − e n ( s, t ) ds − α − a n e n (1 , t )+ (cid:18) tB n ( t ) − (cid:19) (cid:90) R a n s − e n ( s, t ) ds − (cid:18) tB n ( t ) − (cid:19) α − a n e n (1 , t ) (36)37nd combining Theorem 2.6 with (35) shows that the limit of the sequence Y ( R ) n , n (cid:62) Z ( R ) n ·· = a n (cid:90) R s − e n ( s, t ) ds − α − a n e n (1 , t ) , n (cid:62) . (37)By Theorem 2.6 and the continuous mapping theorem, we conclude that Z ( R ) n , n (cid:62) (cid:82) R ξ ( s, t ) ds − α − ξ (1 , t ), where ξ ( s, t ) it the limiting processin (11) or (12), respectively. Using the same arguments as in Kulik and Soulier (2011), itcan be shown that the convergence of Y ( R ) n , n (cid:62) 1, can be easily extended to convergenceof a n tB n ( t ) (cid:90) ∞ s − e n ( s, t ) ds − a n tαB n ( t ) e n (1 , t ) , n (cid:62) . (38)Hence, we conclude that ta n (cid:0)(cid:98) γ (cid:98) nt (cid:99) − α − (cid:1) ⇒ (cid:90) ∞ ξ ( s, t ) ds − α − ξ (1 , t )in D [ t , nd n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , separation of the variables s and t shows that the limitvanishes. This finishes the proof of Corollary 2.7. Proof of Corollary 2.8. Define (cid:98) T n ( s, t ) ·· = 1 (cid:98) k n t (cid:99) (cid:98) nt (cid:99) (cid:88) j =1 (cid:8) X j > sX (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:9) . Then, it holds that (cid:90) ∞ s − (cid:98) T n ( s, t ) ds = 1 (cid:98) k n t (cid:99) (cid:98) nt (cid:99) (cid:88) j =1 (cid:90) ∞ s − (cid:8) X j > sX (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:9) ds = (cid:98) γ Hill ( t ) . According to Skorokhod’s representation theorem (Theorem 2.3.4 in Shorack and Wellner(1986)) and Theorem 2.6, we may assume without loss of generality thatsup s ∈ [1 , ∞ ] ,t ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a n nF ( u n ) (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n s } − s − α t − ξ ( s, t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) −→ a n = n/d n,r if n/d n,r = o (cid:18)(cid:113) nF ( u n ) (cid:19) , a n = (cid:113) nF ( u n ) if (cid:113) nF ( u n ) = o ( n/d n,r ),and ξ denotes the corresponding limiting process in Theorem 2.6. In order to apply38ervaat’s Lemma as stated by Lemma 5 in Einmahl et al. (2010), we rephrase the aboveconvergence as: sup s ∈ [0 , ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12) a n (Γ n,t ( s ) − s ) − t ξ ( s − α , t ) (cid:12)(cid:12)(cid:12)(cid:12) −→ t > n,t ( s ) ·· = 1 F ( u n ) 1 (cid:98) nt (cid:99) (cid:98) nt (cid:99) (cid:88) j =1 (cid:110) X j > u n s − α (cid:111) . Choosing k n = nF ( u n ), it follows thatΓ − n,t ( s ) = (cid:0) u − n X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− s (cid:98) k n t (cid:99) (cid:1) − α . As a result, Vervaat’s Lemma yieldssup s ∈ [0 , ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12) a n (cid:16)(cid:0) u − n X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)− s (cid:98) k n t (cid:99) (cid:1) − α − s (cid:17) + 1 t ξ ( s − α , t ) (cid:12)(cid:12)(cid:12)(cid:12) −→ . (40)Setting s n = su − n X (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) , we arrive atsup s ∈ [1 , ∞ ] ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a n nF ( u n ) 1 t (cid:98) nt (cid:99) (cid:88) j =1 (cid:8) X j > sX (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:9) − s − α − t (cid:0) ξ ( s, t ) − s − α ξ (1 , t ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) sup s ∈ [1 , ∞ ] ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a n nF ( u n ) 1 t (cid:98) nt (cid:99) (cid:88) j =1 { X j > u n s } − s − α − t ξ ( s, t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup s ∈ [1 , ∞ ] ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12) a n (cid:0) s − αn − s − α (cid:1) + s − α t ξ (1 , t ) (cid:12)(cid:12)(cid:12)(cid:12) + sup s ∈ [1 , ∞ ] ,t ∈ [ t , (cid:12)(cid:12)(cid:12)(cid:12) t ( ξ ( s n , t ) − ξ ( s, t )) (cid:12)(cid:12)(cid:12)(cid:12) . The first two summands on the right-hand side converge to 0 almost surely according to(39) and (40). Moreover, we have s − αn = s − α (1 + o (1)) a.s. uniformly in t and s suchthat it follows by a continuity argument that the third summand converges to 0 almostsurely, as well.Since k n = nF ( u n ), we have a n (cid:16) (cid:98) T n ( s, t ) − T ( s, (cid:17) = a n (cid:98) k n t (cid:99) (cid:98) nt (cid:99) (cid:88) j =1 (cid:8) X j > sX (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:9) − s − α = a n nF ( u n ) 1 t (cid:98) nt (cid:99) (cid:88) j =1 (cid:8) X j > sX (cid:98) nt (cid:99) : (cid:98) nt (cid:99)−(cid:98) k n t (cid:99) (cid:9) − s − α + o P (1) ⇒ t (cid:0) ξ ( s, t ) − s − α ξ (1 , t ) (cid:1) . a n t ( (cid:98) γ Hill ( t ) − γ ) = t (cid:90) ∞ s − a n (cid:16) (cid:98) T n ( s, t ) − T ( s, (cid:17) ds ⇒ (cid:90) ∞ ξ ( s, t ) ds − α − ξ (1 , t )in D [ t , Proof of Corollaries 2.10 and 2.11. Since, due to Corollaries 2.7 and 2.8, (cid:98) γ n and (cid:98) γ Hill (1)converge to γ in probability, Corollaries 2.10 and 2.11 follow from Slutsky’s theorem andan application of the continuous mapping theorem to the processes a n t (cid:0)(cid:98) γ (cid:98) nt (cid:99) − γ (cid:1) ⇒ (cid:90) ∞ s − ξ ( s, t ) ds − α − ξ (1 , t ) , t ∈ [ t , ,a n t ( (cid:98) γ Hill ( t ) − γ ) ⇒ (cid:90) ∞ s − ξ ( s, t ) ds − α − ξ (1 , t ) , t ∈ [ t , , and the function f (cid:55)→ sup t ∈ [ t , | f ( t ) − tf (1) | . The following lemmata originate in Kulik and Soulier (2011) and are essential to theproof of Theorem 2.6. Lemma 6.1 (Lemma 4.1 in Kulik and Soulier (2011)) . Assume that F ε ∈ ( α, η ∗ ) .For positive ε there exists a constant C such that ∀ t (cid:62) , ∀ z > , (cid:12)(cid:12)(cid:12)(cid:12) F ε ( zt ) F ε ( t ) − z − α (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) Cη ∗ ( t ) z − α − ρ (cid:0) max (cid:8) z, z − (cid:9)(cid:1) ε . (41)This implies that sup s (cid:62) s (cid:12)(cid:12)(cid:12)(cid:12) F ε ( u n s ) F ε ( u n ) − s − α (cid:12)(cid:12)(cid:12)(cid:12) = O ( η ∗ ( u n )) . (42)Using boundedness of η ∗ , we get the following simplified version of inequality (41): ∀ t (cid:62) , ∀ z > , F ε ( zt ) F ε ( t ) (cid:54) z − α + C ε z − α − ρ (cid:0) max (cid:8) z, z − (cid:9)(cid:1) ε . (43)We also need the following bound on the increments of F ε .40 emma 6.2 (Lemma 4.2 in Kulik and Soulier (2011)) . Assume that F ε ∈ ( α, η ∗ ) .For positive (cid:15) there exists a constant C such that for all t (cid:62) and b > a > (cid:12)(cid:12)(cid:12)(cid:12) F ε ( at ) − F ε ( bt ) F ε ( t ) − (cid:0) a − α − b − α (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) Cη ∗ ( t ) (min { a, } ) − α − ρ − (cid:15) ( b − a ) . (44)Using again boundedness of η ∗ , we get the following simplified version of inequality(44): F ε ( at ) − F ε ( bt ) F ε ( t ) (cid:54) a − α − b − α + C (cid:15) (min { a, } ) − α − ρ − (cid:15) ( b − a ) . (45) References Betken, A. and Kulik, R. (2019). Testing for change in long-memory stochastic volatilitytime series. Journal of Time Series Analysis , 40(5):707 – 738.Bilayi-Biakana, C., Ivanoff, G., and Kulik, R. (2019). The tail empirical process for longmemory stochastic volatility models with leverage. Electronic Journal of Statistics ,13(2):3453 – 3484.Breidt, F. J., Crato, N., and de Lima, P. (1998). The detection and estimation of longmemory in stochastic volatility. Journal of Econometrics , 83(1 – 2):325 – 348.Breiman, L. (1965). On some limit theorems similar to the arc-sin law. Teoriya Veroyat-nostei i ee Primeneniya , 10:351 – 360.Cont, R. (2005). Long range dependence in financial markets. In Fractals in engineering ,pages 159 – 179. Springer.Deo, R., Hsieh, M., Hurvich, C. M., and Soulier, P. (2006). Long memory in nonlinearprocesses. In Dependence in probability and statistics , volume 187 of Lecture Notes inStatistics , pages 221 – 244. Springer, New York.Drees, H. (1998a). A general class of estimators of the extreme value index. Journal ofStatistical Planning and Inference , 66(1):95 – 112.Drees, H. (1998b). On smooth statistical tail functionals. Scandinavian Journal of Statis-tics , 25:187 – 210. 41rees, H. (2000). Weighted approximations of tail processes for β -mixing random vari-ables. Annals of Applied Probability , 10(4):1274 – 1301.DuMouchel, W. H. (1983). Estimating the stable index α in order to measure tail thick-ness: a critique. Annals of Statistics , 11(4):1019 – 1031.Einmahl, J. H. J. (1990). The empirical distribution function as a tail estimator. StatisticaNeerlandica , 44(2):79 – 82.Einmahl, J. H. J. (1992). Limit theorems for tail processes with application to interme-diate quantile estimation. Journal of Statistical Planning and Inference , 32(1):137 –145.Einmahl, J. H. J., Gantner, M., and Sawitzki, G. (2010). Asymptotics of the shorth plot. Journal of Statistical Planning and Inference , 140(11):3003 – 3012.Freedman, D. A. (1975). On tail probabilities for martingales. The Annals of Probability ,pages 100 – 118.Galbraith, J. W. and Zernov, S. (2004). Circuit breakers and the tail index of equityreturns. Journal of Financial Econometrics , 2(1):109 – 129.Hall, P. (1982). On some simple estimates of an exponent of regular variation. Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 44(1):37 – 42.Harvey, A. C. (2002). Long memory in stochastic volatility. In Forecasting Volatility inthe Financial Markets , pages 307 – 320. Butterwoth-Heinemann Finance.Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics , 3(5):1163 – 1174.Hoga, Y. (2017). Change point tests for the tail index of β -mixing random variables. Econometric Theory , 33(4):915 – 954.Hurvich, C. M. and Soulier, P. (2009). Stochastic Volatility Models with Long Memory ,pages 345 – 354. Springer.Kim, M. and Lee, S. (2011). Change point test for tail index for dependent data. Metrika ,74(3):297 – 311. 42im, M. and Lee, S. (2012). Change point test of tail index for autoregressive processes. Journal of the Korean Statistical Society , 41(3):305 – 312.Koedijk, K. G., Schafgans, M. M., and De Vries, C. G. (1990). The tail index of exchangerate returns. Journal of International Economics , 29(1-2):93 – 108.Kulik, R. and Soulier, P. (2011). The tail empirical process for long memory stochasticvolatility sequences. Stochastic Processes and their Applications , 121(1):109 – 134.Mandelbrot, B. B. (1963). The Variation of Certain Speculative Prices. Business , 36:394– 419.Mason, D. M. (1988). A strong invariance theorem for the tail empirical process. In Annales de l’IHP Probabilit´es et statistiques , volume 24, pages 491 – 506.Phillips, P. C., Loretan, M., et al. (1990). Testing covariance stationarity under momentcondition failure with an application to common stock returns. Technical report, CowlesFoundation for Research in Economics, Yale University.Pipiras, V. and Taqqu, M. S. (2017). Long-Range Dependence and Self-Similarity , vol-ume 45. Cambridge University Press.Pollard, D. (1984). Convergence of stochastic processes . Springer Series in Statistics.Springer-Verlag, New York.Quintos, C., Fan, Z., and Phillips, P. (2001). Structural change tests in tail behaviourand the Asian crisis. The Review of Economic Studies , 68(3):633 – 663.Resnick, S. I. and St˘aric˘a, C. (1997). Asymptotic behavior of hill’s estimator for autore-gressive data. Communications in Statistics. Stochastic Models , 13(4):703 – 721. Heavytails and highly volatile phenomena.Rootz´en, H. (2009). Weak convergence of the tail empirical process for dependent se-quences. Stochastic Processes and their Applications , 119(2):468 – 490.Shorack, G. R. and Wellner, J. A. (1986). Empirical processes with applications to statis-tics . Wiley Series in Probability and Mathematical Statistics: Probability and Mathe-matical Statistics. John Wiley & Sons, Inc., New York.43aqqu, M. S. (1979). Zeitschrift f¨ur wahrscheinlichkeitstheorie und verwandte gebiete. Zeitschrift , 50(1):53 – 83.Taylor, S. J. (1986). Modelling Financial Time Series . Wiley, New York.Werner, T. and Upper, C. (2004). Time variation in the tail behavior of Bund futurereturns.