Inference on the tail process with application to financial time series modelling
IInference on the tail process with application to financialtime series modelling
Richard A. Davis
Columbia UniversityDepartment of Statistics1255 Amsterdam Ave. New York, NY 10027, USA [email protected]
Holger Drees
University of HamburgDepartment of MathematicsBundesstraße 55, 20146 Hamburg, Germany [email protected]
Johan Segers Micha(cid:32)l Warcho(cid:32)l
Universit´e catholique de LouvainInstitut de Statistique, Biostatistique et Sciences ActuariellesVoie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium [email protected] , [email protected] August 7, 2018
Abstract
To draw inference on serial extremal dependence within heavy-tailed Markov chains, Drees,Segers and Warcho(cid:32)l [Extremes (2015) 18, 369–402] proposed nonparametric estimators of thespectral tail process. The methodology can be extended to the more general setting of astationary, regularly varying time series. The large-sample distribution of the estimators isderived via empirical process theory for cluster functionals. The finite-sample performance ofthese estimators is evaluated via Monte Carlo simulations. Moreover, two different bootstrapschemes are employed which yield confidence intervals for the pre-asymptotic spectral tailprocess: the stationary bootstrap and the multiplier block bootstrap. The estimators areapplied to stock price data to study the persistence of positive and negative shocks.
Keywords:
Financial time series; Heavy–tails; Multiplier block bootstrap; Regular variation;Shock persistence; Stationary time series; Tail process.
The typical modelling paradigm for a time series often starts by choosing a flexible class of modelsthat captures salient features present in the data. Of course, features depends on the type ofcharacteristics one is looking for. For a financial time series consisting of say log-returns of someasset, the key features, often referred to as stylized facts , include heavy-tailed marginal distri-butions and serially uncorrelated but dependent data. These characteristics are readily detectedusing standard diagnostics such as qq-plots of the marginal distribution and plots of the sampleautocorrelation function (ACF) of the data and the squares of the data. The GARCH process(and its variants) as well as the stochastic volatility (SV) process driven by heavy-tailed noiseexhibit these attributes and often serve as a starting point for building a model. More recently,considerable attention has been directed towards studying the extremal behavior of both financialand environmental time series, especially as it relates to estimating risk factors. Extremes for suchtime series can occur in clusters and getting a handle on the nature of clusters both in terms ofsize and frequency of occurrence is important for evaluating various risk measures. Ultimately, onewants to choose models that adequately describe various extremal dependence features observed1 a r X i v : . [ s t a t . M E ] J a n n the data. The theme of this paper is to provide additional tools that not only give measuresof extremal dependence, but can be used as a basis for assessing the quality of a model’s fit toextremal properties present in the data.The extremal index θ ∈ (0 ,
1] (Leadbetter, 1983) is one such measure of extremal dependencefor a stationary time series. It is a measure of extremal clustering (1 /θ is the mean clustersize of extremes) with θ < θ = 1 signifying no clustering in thelimit. Unfortunately, θ is a rather crude measure and does not provide fine detail about extremaldependence. The extremogram, developed in Davis and Mikosch (2009), is an attempt to providea measure of serial dependence among the extremes in a stationary time series. It was conceivedto be used in much the same way as an ACF in traditional time series modelling, but only appliedto extreme values.In this paper, we will use the spectral tail process, as formulated by Basrak and Segers (2009)for heavy-tailed time series, to assess and measure extremal dependence. The spectral tail pro-cess provides a more in-depth description of the structure of extremal dependence than the ex-tremogram. The first objective of this paper will be to establish limit theory for nonparametricestimates of the distribution of the spectral tail process for a class of heavy-tailed stationary timeseries. This builds on earlier work of Drees et al. (2015) for heavy-tailed Markov chains. Thenonparametric estimates provide quantitative information about extremal dependence within atime series and as such can be used in both exploratory and confirmatory phases of modelling.As an example, it provides estimates of the probability that an extreme observation will occur attime t , given one has occurred at time 0, and that its absolute value will be even larger. Theseestimates can also be used for model confirmation, in much the same way that the ACF is usedfor assessing quality of fit for second-order models of time series. For example, one can computea pre-asymptotic version (to be defined later) of the distribution of the spectral tail process froma GARCH process, which in most cases can be easily calculated via simulation. Then the esti-mated distribution of the spectral tail process can be compared with the pre-asymptotic versioncorresponding to a model for compatibility. A good fit would indicate the plausibility of using aGARCH model for capturing serial extremal dependence. The second main objective is then toprovide a useful way of measuring compatibility, which we propose using resampling methods.Recently, there has been increasing interest in the econometric literature for estimating quan-tities related to extremal dependence. For stochastic processes in continuous time, Bollerslevet al. (2013) define a χ -coefficient, derived from the extremogram, for assessing tail dependenciesapplied to financial time series. In a follow-up paper that explores tail risk premia, Bollerslevet al. (2015) make a connection between their estimates of the time-varying tail shape param-eters and the extremogram. Linton and Whang (2007) (see also Han et al. (2016)) introducedthe quantilogram, a diagnostic tool for measuring directional predictability in a time series. Insome respects, our development can be viewed as the quantilogram for extreme quantiles. Thetheory, however, is different in that our quantiles are going to infinity. Nevertheless, our work doesfocus on a type of directional predictability , but only concentrated in the extremes. Tjøstheim andHufthammer (2013) consider local Gaussian correlation and relate it to tail index dependence andthe extremogram in a time series context. Their methodology is applied to financial time series.The key object of study in this paper is the tail process and in particular, its normalized version– the spectral tail process. A strictly stationary univariate time series ( X t ) t ∈ Z is said to have a tail process ( Y t ) t ∈ Z if, for all integers s ≤ t , we have L (cid:0) u − X s , . . . , u − X t | | X | > u (cid:1) d −→ L ( Y s , . . . , Y t ) , u → ∞ , (1.1)with the implicit understanding that the law of | Y | is non-degenerate. The law of | Y | is thennecessarily Pareto( α ) for some α > u (cid:55)→ P[ | X | > u ] is regularly varying atinfinity with index − α :lim u →∞ P[ | X | > uy ]P[ | X | > u ] = P[ | Y | > y ] = y − α , y ∈ [1 , ∞ ) . (1.2)The existence of a tail process is equivalent to multivariate regular variation of the finite-dimensionaldistributions of ( X t ) t ∈ Z (Basrak and Segers, 2009, Theorem 2.1). In many respects, this condition2an be viewed as the heavy-tailed analogue of the condition that a process is Gaussian in the sensethat all the finite-dimensional distributions are specified to be of a certain type.The spectral tail process is defined by Θ t = Y t / | Y | , for t ∈ Z . By (1.1), it follows that for allintegers s ≤ t , we have L ( X /u, X s / | X | , . . . , X t / | X | | | X | > u ) d −→ L ( Y , Θ s , . . . , Θ t ) , u → ∞ . (1.3)The difference between (1.1) and (1.3) is that in the latter equation, the variables X t have beennormalized by | X | rather than by the threshold u . Such auto-normalization allows the tail processto be decomposed into two stochastically independent components, i.e., Y t = | Y | Θ t , t ∈ Z . Independence of | Y | and (Θ t ) t ∈ Z is stated in Basrak and Segers (2009, Theorem 3.1). The randomvariable | Y | characterizes the magnitudes of extremes, whereas (Θ t ) t ∈ Z captures serial dependence.The spectral tail process at time t = 0 yields information on the relative weights of the upper andlower tails of | X | : since Θ = Y / | Y | = sign( Y ), we have p = P[Θ = +1] = lim u →∞ P[ X > u ]P[ | X | > u ] , − p = P[Θ = − . (1.4)The distributions of the forward tail process ( Y t ) t ≥ and the backward tail process ( Y t ) t ≤ mutually determine each other (Basrak and Segers, 2009, Theorem 3.1). For all i, s, t ∈ Z with s ≤ ≤ t and for all measurable functions f : R t − s +1 → R satisfying f ( y s , . . . , y t ) = 0 whenever y = 0, we have, provided the expectations exist,E[ f (Θ s − i , . . . , Θ t − i )] = E (cid:20) f (cid:18) Θ s | Θ i | , . . . , Θ t | Θ i | (cid:19) | Θ i | α { Θ i (cid:54) = 0 } (cid:21) . (1.5)The indicator variable { Θ i (cid:54) = 0 } can be omitted because of the presence of | Θ i | α , but sometimes, itis useful to mention it explicitly in order to avoid errors arising from division by zero. By exploitingthe ‘time-change formula’ (1.5), we will be able to improve upon the efficiency of estimators of thespectral tail process.Main interest in this paper is in the cumulative distribution function (cdf), F (Θ t ) , of Θ t . If F (Θ t ) is continuous at a point x , thenlim u →∞ P[ X t / | X | ≤ x | | X | > u ] = P[Θ t ≤ x ] = F (Θ t ) ( x ) . (1.6)We consider two estimates of F (Θ t ) ( x ) based on forward and backward representations for thetail process. While these estimates are asymptotically normal, the expressions for the asymptoticvariances are too complicated to be useful for constructing confidence regions. To overcome thislimitation, inference procedures can be carried out using resampling methods. Two resamplingmethods for constructing confidence intervals, based on the stationary bootstrap as used in Daviset al. (2012), and the multiplier block bootstrap as described in Drees (2015), are applied to ourestimates of F (Θ t ) ( y ). In terms of coverage probabilities, the multiplier block bootstrap performedbetter than the stationary bootstrap procedure in all the cases we considered. However, bothprocedures require care when applied for very high thresholds.We apply the methodology to study serial extremal dependence of daily log-returns on theS&P500 index and the P&G stock price. We distinguish between two sources of such dependence– positive and negative shocks – pointing out an asymmetric behavior. Specifically, we considercases when extreme values (positive or negative) follow positive/negative shocks t time lags later.In terms of the spectral tail process, this corresponds to the probabilities P[ ± Θ t > | Θ = ± The data consist of a stretch X − ˜ t , . . . , X n +˜ t , where ˜ t is fixed and corresponds to the maximallag of interest, drawn from a regularly varying, stationary univariate time series with spectral tailprocess (Θ t ) t ∈ Z and index α > p = P[Θ = 1], we simply take the empirical version of (1.4), yieldingˆ p n = (cid:80) ni =1 ( X i > u n ) (cid:80) ni =1 ( | X i | > u n ) . For ˆ p n to be consistent and asymptotically normal, the threshold sequence u n should tend toinfinity at a certain rate described in the next section.To estimate the cdf, F (Θ t ) , of Θ t , we propose the forward estimator ˆ F (f , Θ t ) n ( x ) := (cid:80) ni =1 ( X i + t / | X i | ≤ x, | X i | > u n ) (cid:80) ni =1 ( | X i | > u n ) . (2.1)This is just the empirical version of the left-hand side of (1.6). In equations (1.1) and (1.3), theconditioning event is {| X | > u } , making no distinction between positive extremes, X > u , andnegative extremes, X < − u . However, these two cases can be distinguished by conditioning onthe sign of Θ . In particular, we defineˆ F (f , Θ t | Θ = ± n ( x ) := (cid:80) ni =1 ( X i + t / ( ± X i ) ≤ x, ± X i > u n ) (cid:80) ni =1 ( ± X i > u n ) . (2.2)The numerator in the estimator is a sum of indicator functions, most of which are zero. Thisoften leads to a large variance. The time-change formula (1.5) yields a different representation ofthe law of Θ t , motivating a different estimator than the one above. Depending on the value of x , the new estimator will involve more non-zero indicators, which receive weights instead. Thesimulation study reported in Section 4.1 will show that the resulting estimator may have a smallervariance than the one in (2.2), in particular if | x | is large. Lemma 2.1.
Let ( X t ) t ∈ Z be a stationary univariate time series, regularly varying with index α and spectral tail process (Θ t ) t ∈ Z . Then, for all integer t (cid:54) = 0 , P[Θ t ≤ x ] = − E[ | Θ − t | α (Θ / | Θ − t | > x )] if x ≥ , E[ | Θ − t | α (Θ / | Θ − t | ≤ x )] if x < . (2.3) Moreover
P[Θ t ≤ x | Θ = 1] = − p E (cid:2) Θ α − t (1 / Θ − t > x, Θ = 1) (cid:3) if x ≥ , p E (cid:2) Θ α − t ( − / Θ − t ≤ x, Θ = − (cid:3) if x < , (2.4)4 nd P[Θ t ≤ x | Θ = −
1] = − − p E[( − Θ − t ) α ( − / Θ − t > x, Θ = 1)] if x ≥ , − p E[( − Θ − t ) α (1 / Θ − t ≤ x, Θ = − if x < . (2.5)If population quantities are replaced by their sample counterparts, Lemma 2.1 suggests thefollowing backward estimator of the cdf of Θ t :ˆ F (b , Θ t ) n ( x ) := − (cid:80) ni =1 | X i − t X i | ˆ α n ( X i / | X i − t | > x, | X i | > u n ) (cid:80) ni =1 ( | X i | > u n ) if x ≥ , (cid:80) ni =1 | X i − t X i | ˆ α n ( X i / | X i − t | ≤ x, | X i | > u n ) (cid:80) ni =1 ( | X i | > u n ) if x <
0. (2.6)Here, ˆ α n is an estimator of the tail index, for which we will take the Hill-type estimatorˆ α n = (cid:80) ni =1 ( | X i | > u n ) (cid:80) ni =1 log ( | X i | /u n ) ( | X i | > u n ) . (2.7)Conditioning on an extreme value of a specific sign, we getˆ F (b , Θ t | Θ = ± n ( x ) := − (cid:80) ni =1 (cid:16) ± X i − t X i (cid:17) ˆ α n ( ± X i /X i − t > x, X i > u n ) (cid:80) ni =1 ( ± X i > u n ) if x ≥ , (cid:80) ni =1 (cid:16) ∓ X i − t X i (cid:17) ˆ α n ( ± X i /X i − t ≤ x, X i < − u n ) (cid:80) ni =1 ( ± X i > u n ) if x < We explore two different bootstrap schemes that yield confidence intervals for F (Θ t ) ( x ), or rather,for the pre-asymptotic version P[ X t / | X | ≤ x | | X | > u ]: the stationary bootstrap and the mul-tiplier block bootstrap. We apply each of the two resampling schemes to both the forward andbackward estimators at various levels x and at different lags t .The stationary bootstrap goes back to Politis and Romano (1994) and is an adaptation of theblock bootstrap by allowing for random block sizes. The resampling scheme was applied to theextremogram in Davis et al. (2012). It consists of generating pseudo-samples X ∗ , . . . , X ∗ n , drawnfrom the sample X , . . . , X n by taking the first n values in the sequence X K , . . . , X K + L − , X K , . . . , X K + L − , . . . , where K , K . . . is an iid sequence of random variables uniformly distributed on { , . . . , n } and L , L , . . . is an iid sequence of geometrically distributed random variables (independent of ( K j ) j ∈ N )with distribution P[ L = l ] = p (1 − p ) l − , l = 1 , , . . . for some p = p n ∈ (0 ,
1) such that p n → np n → ∞ . If the index t thus obtained exceeds the sample size n , we replace t by ( t − n ) + 1, i.e., we continue from the beginning of the sample. The estimators are thenapplied to X ∗ − ˜ t , . . . , X ∗ n +˜ t .The multiplier block bootstrap method was applied to cluster functionals in Drees (2015).It consists of splitting the data set into m n = (cid:98) n/r n (cid:99) blocks of length r n and multiplying thecluster functionals of each block by a random factor. (Here (cid:98) x (cid:99) denotes the integer part of x .)5pecifically, for iid random variables ξ j , independent of ( X t ) t ∈ Z , with E[ ξ j ] = 0 and var [ ξ j ] = 1,the bootstrapped forward estimator can be written asˆ F ∗ (f , Θ t ) n ( x ) := (cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j (cid:16) X i + t | X i | ≤ x, | X i | > u n (cid:17)(cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j ( | X i | > u n ) , where I j = { ( j − r n + 1 , . . . , jr n } denotes the set of indices belonging to the j th block. Similarly,the bootstrapped backward estimator for x > F ∗ (b , Θ t ) n ( x ) := 1 − (cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j | X i − t X i | ˆ α ∗ n (cid:16) X i | X i − t | > x, | X i | > u n (cid:17)(cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j ( | X i | > u n ) (2.8)with ˆ α ∗ n := (cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j ( | X i | > u n ) (cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j log( | X i | /u n ) ( | X i | > u n ) . (2.9)If the threshold u n is high, it may be advisable to construct bootstrap confidence intervalsbased on lower thresholds and then scale accordingly; see the explanation after Theorem 3.3. For iid random variables, the spectral tail process simplifies to Θ t ≡ t . If thisoccurs for a stationary, regularly varying time series, then we say that the series exhibits serialextremal independence . The opposite case is referred to as serial extremal dependence , i.e., at leastone of the variables Θ t for t (cid:54) = 0 is not degenerate at 0. Since the convergence of the pre-asymptoticdistribution can be arbitrarily slow, one cannot formally test for extremal dependence within thepresent framework.However, if one wants to test whether the exceedances over a given high threshold u areindependent then one may check whether the lower bound of a confidence interval for, say,P[ | X t | ≥ | X | | | X | > u ] constructed by one of the bootstrap methodologies is larger than thisprobability under the assumption of exact independence of | X | ( | X | > u ) and | X t | ( | X t | > u ),which is easily shown to equal P[ | X | > u ] / Under certain conditions, the standardized estimation errors of the forward and the backwardestimators converge jointly to a centered Gaussian process (Section 3.1). Convergence of themultiplier block bootstrap follows under the same conditions (Section 3.2).In order not to overload the presentation, we focus on nonnegative time series. We brieflyindicate how the conditions and results must be modified in the real-valued case.6 .1 Asymptotic normality of the estimators
All estimators under consideration can then be expressed in terms of generalized tail array sums .These are statistics of the form (cid:80) ni =1 φ ( X n,i ), with X n,i := u − n (cid:0) X i − ˜ t , . . . , X i , . . . , X i +˜ t (cid:1) ( X i > u n ) . (3.1)Drees and Rootz´en (2010) give conditions under which, after standardization, such statistics con-verge to a centered Gaussian process, uniformly over appropriate families of functions φ . Fromthese results we will deduce a functional central limit theorem for the processes of forward andbackward estimators defined in (2.1) and (2.6) with ˆ α n according to (2.7), respectively.To ensure consistency, the threshold u n must tend to infinity in such a way that v n := P[ X > u n ]tends to 0, but the expected number, nv n , of exceedances tends to infinity. Moreover, we haveto ensure that observations which are sufficiently separated in time are almost independent. Thestrength of dependence will be assessed by the β -mixing coefficients β n,k := sup ≤ l ≤ n − k − E (cid:34) sup B ∈B nn,l + k +1 (cid:12)(cid:12) P (cid:2) B | B ln, (cid:3) − P[ B ] (cid:12)(cid:12)(cid:35) . Here B jn,i is the σ -field generated by ( X n,l ) i ≤ l ≤ j .We assume that there exist sequences l n , r n → ∞ and some x ≥ (A( x )) The cdf of Θ t , F (Θ t ) , is continuous on [ x , ∞ ), for t ∈ { , . . . , ˜ t } . (B) As n → ∞ , we have l n → ∞ , l n = o ( r n ), r n = o (( nv n ) / ), r n v n →
0, and β n,l n n/r n → (C) For all k ∈ { , . . . , r n } , there exists s n ( k ) ≥ E (cid:20) log (cid:16) X u n (cid:17) max (cid:110) log (cid:16) X k u n (cid:17) , ( X k > u n ) (cid:111) (cid:12)(cid:12)(cid:12) X > u n (cid:21) (3.2)such that s ∞ ( k ) = lim n →∞ s n ( k ) exists, lim n →∞ (cid:80) r n k =1 s n ( k ) = (cid:80) ∞ k =1 s ∞ ( k ) holds and thelast sum is finite.Moreover, there exists δ > r n (cid:88) k =1 (cid:18) E (cid:20)(cid:16) log + (cid:16) X u n (cid:17) log + (cid:16) X k u n (cid:17)(cid:17) δ (cid:12)(cid:12)(cid:12) X > u n (cid:21) (cid:19) / (1+ δ ) = O (1) , n → ∞ . (3.3)Without Condition (A( x )) one cannot expect uniform convergence of the estimated cdf of Θ t tothe true cdf on [ x , ∞ ). Indeed, in this case even P[ X t /X ≤ x | | X | > u ] need not converge to F (Θ t ) ( x ) for a point of discontinuity x . Condition (B) imposes restrictions on the rate at which v n tends to 0 and thus on the rate at which u n tends to ∞ . Often, the β -mixing coefficients decaygeometrically, i.e., β n,k = O ( η k ) for some η ∈ (0 , l n = O (log n ), andCondition (B) is fulfilled for a suitably chosen r n if (log n ) /n = o ( v n ) and v n = o (1 / (log n )).The technical Condition (C) rules out too large a cluster of extreme observations. Usingintegration by parts, the right-hand side of (3.2) can be bounded by v − n (cid:90) ∞ (cid:18) P[ X > u n s, X k > u n ] + (cid:90) ∞ P[ X > u n s, X k > u n t ] t − dt (cid:19) s − ds. Now one can use techniques employed in Drees (2000) and Drees (2003) to verify (3.2) for specifictime series models like solutions to stochastic recurrence equations or suitable heavy tailed linear7ime series. (Typically, the upper bounds s n ( k ) are of the form ρ k + ξ n for a summable sequence ρ k and ξ n = o (1 /r n ).) The left-hand side of (3.3) can be rewritten in the form r n (cid:88) k =1 (cid:18) (1 + δ ) v − n (cid:90) ∞ (cid:90) ∞ P[ X > u n s, X k > u n t ] (log s log t ) δ ( st ) − ds dt (cid:19) / (1+ δ ) , which can then be bounded by similar techniques.Under these conditions, one can prove the asymptotic normality of relevant generalized tailarray sums (see Proposition 6.1 below) and thus the joint uniform asymptotic normality of theappropriately centered forward and the backward estimator of F (Θ t ) . Theorem 3.1.
Let ( X t ) t ∈ Z be a stationary, regularly varying process. If (A( x )), (B) and (C)are fulfilled for some x ≥ and y ∈ [ x , ∞ ) ∩ (0 , ∞ ) , then ( nv n ) / (cid:32) (cid:0) ˆ F (f , Θ t ) n ( x t ) − P[ X t /X ≤ x t | X > u n ] (cid:1) x t ∈ [ x , ∞ ) (cid:0) ˆ F (b , Θ t ) n ( y t ) − (1 − E[( X − t /X ) α ( X /X − t > y t ) | X > u n ]) (cid:1) y t ∈ [ y , ∞ ) (cid:33) | t |∈{ ,..., ˜ t } d −→ (cid:18) ( Z ( φ t ,x t ) − ¯ F (Θ t ) ( x t ) Z ( φ )) x t ∈ [ x , ∞ ) ( Z ( φ t ,x t ) − ¯ F (Θ t ) ( y t ) Z ( φ ) + ( α Z ( φ ) − αZ ( φ )) E[log(Θ t ) (Θ t > y t )]) y t ∈ [ y , ∞ ) (cid:19) | t |∈{ ,..., ˜ t } (3.4) where Z is a centered Gaussian process, indexed by functions defined in (6.2) , whose covariancefunction is given in (6.3) , and ¯ F (Θ t ) := 1 − F (Θ t ) denotes the survival function of Θ t . (Assertion (3.4) means that for suitable versions of the processes the convergence holds uniformly for all x t ≥ x , y t ≥ y and | t | ∈ { , . . . , ˜ t } almost surely.) Additional conditions are needed to ensure that the biases of the forward and the backwardestimator of F (Θ t ) are asymptotically negligible:sup x ∈ [ x , ∞ ) (cid:12)(cid:12)(cid:12)(cid:12) P (cid:20) X t X ≤ x (cid:12)(cid:12)(cid:12) X > u n (cid:21) − F (Θ t ) ( x ) (cid:12)(cid:12)(cid:12)(cid:12) = o (cid:0) ( nv n ) − / (cid:1) , (3.5)sup y ∈ [ y , ∞ ) (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20)(cid:16) X − t X (cid:17) α ( X /X − t > y ) (cid:12)(cid:12)(cid:12) X > u n (cid:21) − ¯ F (Θ t ) ( y ) (cid:12)(cid:12)(cid:12)(cid:12) = o (cid:0) ( nv n ) − / (cid:1) , (3.6) (cid:12)(cid:12) E[log( X /u n ) | X > u n ] − /α (cid:12)(cid:12) = o (cid:0) ( nv n ) − / (cid:1) , (3.7)for t ∈ {− ˜ t, . . . , ˜ t } \ { } as n → ∞ . These conditions are fulfilled if nv n tends to ∞ sufficientlyslowly, because by definition of the spectral tail process, the regular variation of X and by (2.3),the left-hand sides in (3.5)–(3.7) tend to 0 if F (Θ t ) is continuous on [ x , ∞ ). Corollary 3.2.
Let ( X t ) t ∈ Z be a stationary, regularly varying process. If (A( x )), (B), (C), and (3.5) – (3.7) are fulfilled for some x ≥ and y ∈ [ x , ∞ ) ∩ (0 , ∞ ) , then ( nv n ) / (cid:32) ( ˆ F (f , Θ t ) n ( x t ) − F (Θ t ) ( x t )) x t ∈ [ x , ∞ ) ( ˆ F (b , Θ t ) n ( y t ) − F (Θ t ) ( y t )) y t ∈ [ y , ∞ ) (cid:33) | t |∈{ ,..., ˜ t } d −→ (cid:18) ( Z ( φ t ,x t ) − ¯ F (Θ t ) ( x t ) Z ( φ )) x t ∈ [ x , ∞ ) ( Z ( φ t ,x t ) − ¯ F (Θ t ) ( y t ) Z ( φ ) + ( α Z ( φ ) − αZ ( φ )) E[log(Θ t ) (Θ t > y t )]) y t ∈ [ y , ∞ ) (cid:19) | t |∈{ ,..., ˜ t } where Z is the centered Gaussian process defined in Theorem 3.1. In general, it is difficult to compare the asymptotic variances of the backward and the forwardestimator. 8 .2 Consistency of the multiplier block bootstrap
Here we discuss the asymptotic behavior of the multiplier block bootstrap version of the forwardand backward estimators. For the sake of brevity, we focus on estimators of F (Θ t ) ( x ) for a fixed x . Drees (2015) has shown convergence of bootstrap versions of empirical processes of tail arraysums under the same conditions needed for convergence of the original empirical processes. LetP ξ denote the probability w.r.t. ξ = ( ξ j ) j ∈ N , i.e., the conditional probability given ( X n,i ) ≤ i ≤ n . Theorem 3.3.
Let ξ j , j ∈ N , be iid random variables independent of ( X t ) t ∈ Z with E[ ξ j ] = 0 and var [ ξ j ] = 1 . Then, under the conditions of Theorem 3.1, for all x ≥ x , y ≥ y , sup r,s ∈ R t (cid:12)(cid:12)(cid:12)(cid:12) P ξ (cid:104) ( nv n ) / (cid:16) ˆ F ∗ (f , Θ t ) n ( x ) − ˆ F (f , Θ t ) n ( x ) (cid:17) ≤ r t , ( nv n ) / (cid:16) ˆ F ∗ (b , Θ t ) n ( y ) − ˆ F (b , Θ t ) n ( y ) (cid:17) ≤ s t , ∀ | t | ∈ { , . . . , ˜ t } (cid:105) − P (cid:104) ( nv n ) / (cid:16) ˆ F (f , Θ t ) n ( x ) − F (Θ t ) ( x ) (cid:17) ≤ r t , ( nv n ) / (cid:16) ˆ F (b , Θ t ) n ( y ) − F (Θ t ) ( y ) (cid:17) ≤ s t , ∀ | t | ∈ { , . . . , ˜ t } (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) → in probability. In particular, if a and b are such that P ξ (cid:104) ˆ F ∗ (b , Θ t ) n ( y ) ∈ [ a, b ] (cid:105) = β , then (cid:104) F (b , Θ t ) n ( y ) − b, F (b , Θ t ) n ( y ) − a (cid:105) is a confidence interval for F (Θ t ) ( y ) with approximative coverage level β . However, if the num-ber of exceedances over a given threshold is too small, one may prefer to construct confidenceintervals based on bootstrap estimators corresponding to lower thresholds. Let ˜ u n denote anotherthreshold sequence, let ˜ v n = P[ X > ˜ u n ] denote the corresponding exceedance probabilities, andlet ˆ˜ F (b , Θ t ) n ( y ) and ˆ˜ F ∗ (b , Θ t ) n ( y ) denote the backward estimator and the bootstrap version thereof,respectively, based on the exceedances over ˜ u n . The conditional distribution of( n ˜ v n ) / (cid:16) ˆ˜ F ∗ (b , Θ t ) n ( y ) − ˆ˜ F (b , Θ t ) n ( y ) (cid:17) given the data is approximately the same as the unconditional distribution of( nv n ) / (cid:16) ˆ F (b , Θ t ) n ( y ) − F (Θ t ) ( y ) (cid:17) . Hence, if a and b are such that P ξ (cid:104) ˆ˜ F ∗ (b , Θ t ) n ( y ) ∈ [ a, b ] (cid:105) = β and if ˆ˜ v n / ˆ v n is a suitable estimator of˜ v n /v n , then (cid:32) ˆ˜ v n ˆ v n (cid:33) / (cid:0) ˆ˜ F (b , Θ t ) n ( y ) − b (cid:1) + ˆ F (b , Θ t ) n ( y ) , (cid:32) ˆ˜ v n ˆ v n (cid:33) / (cid:0) ˆ˜ F (b , Θ t ) n ( y ) − a (cid:1) + ˆ F (b , Θ t ) n ( y ) (3.8)is a confidence interval for F (Θ t ) ( y ) with approximative coverage probability β . In practice, onewill often use large order statistics as thresholds, say the k n -th and ˜ k n -th largest observations,respectively. In that case, ˜ v n /v n can be replaced by ˜ k n /k n . A similar approach, namely to use avariance estimator which is based on a lower threshold, has successfully been employed in Drees(2003, Section 5).Of course, confidence intervals based on the bootstrap version of the forward estimator can beconstructed analogously. 9 emark . It is possible to generalize Theorem 3.3 to cover the joint limit distribution of thebootstrap estimators for all x ≥ x and y ≥ y . Technically, this requires to endow the space ofprobability measures on spaces of bounded functions from [ x , ∞ ) (resp. [ y , ∞ )) to R t with ametric that induces weak convergence, e.g., the bounded Lipschitz metric. This is the approach inDrees (2015) to establish the consistency of a bootstrap method for estimating the extremogram,a close cousin of the tail process. Based on such a result, one may construct uniform confidencebands for the function F (Θ t ) on [ x , ∞ ) or [ y , ∞ ), respectively, which in general will be consider-ably wider than the pointwise confidence intervals discussed above and will thus often be ratheruninformative. For brevity, we omit the details. Remark . For time series which may take on negative values too, the forward and backwardestimators of F (Θ t ) can be represented in terms of generalized tail array sums constructed from X ˜ tn,i = u − n (cid:0) X i − ˜ t , . . . , X i , . . . , X i +˜ t (cid:1) ( | X i | > u n ) . When x <
0, for example, the backward estimator ˜ F (b , Θ t ) n ( x ) is equal to the ratio of the generalizedtail array sums pertaining to the functions( y − ˜ t , . . . , y , . . . , y ˜ t ) (cid:55)→ | y − t /y | α ( y / | y − t | ≤ x, | y | > , ( y − ˜ t , . . . , y , . . . , y ˜ t ) (cid:55)→ ( | y | > . Limit theorems can be obtained by the same methods as in the case of non-negative observationsunder obvious analogues to the conditions (A( x )), (B) and (C) with v n := P[ | X | > u n ]. In Section 4.1, we show results from a numerical simulation study designed to test the performanceof the forward (2.1) and the backward (2.6) estimators. We continue in Section 4.2 by evaluatingthe performance of two bootstrap schemes, the multiplier block bootstrap and the stationarybootstrap, described in Section 2.2.The simulations are based on pseudo-random samples from two widely used models for financialtime series. Both models are of the form X t = σ t Z t where σ t and Z t are independent. First, weconsider the GARCH(1,1) model with σ t = 0 . . X t − + 0 . σ t − , the innovations Z t beingindependent t random variables, standardized to have unit variance. The second model is thestochastic volatility (SV) process with log σ t = 0 . σ t − + (cid:15) t , with independent standard normalinnovations (cid:15) t and independent innovations Z t with common distribution t . . The parametershave been chosen to ensure that both time series are regularly varying with index α = 2 . We estimate P[Θ t ≤ x ] for both the GARCH(1,1) and the SV model, for various arguments x andlags t , via the forward and the backward estimator, with estimated tail index α . The threshold isset at the empirical 95% quantile of the absolute values of a time series of length n = 2 000. Wedo 1 000 Monte Carlo repetitions and calculate bias, standard deviation, and root mean squarederror (RMSE) with respect to the pre-asymptotic values P (cid:104) X t / | X | ≤ x | | X | > F ←| X | (0 . (cid:105) in theforward representation. The true quantile F ←| X | (0 .
95) of | X | and the true pre-asymptotic valueswere calculated numerically via 10 000 Monte Carlo simulations based on time series of length10 000.It was already reported in the context of Markovian time series that for t = 1 and | x | large,the backward estimators usually have a smaller variance than the forward estimators (Drees et al.,2015). Here, numerical simulations suggest that this is true for non-Markovian time series and forhigher lags as well. The results are presented in Figure 1.10 − . − . . . . P^ ( Q t £ x ) , GARCHx b i a s t=10 forwardt=10 backwardt=1 forwardt=1 backward −2 −1 0 1 2 . . . . . . . P^ ( Q t £ x ) , GARCHx s t anda r d de v i a t i on t=10 forwardt=10 backwardt=1 forwardt=1 backward −2 −1 0 1 2 . . . . . . P^ ( Q t £ x ) , GARCHx R M SE r a t i o t=10t=1−2 −1 0 1 2 − . − . . . . P^ ( Q t £ x ) , SVx b i a s t=10 forwardt=10 backwardt=1 forwardt=1 backward −2 −1 0 1 2 . . . . . . . P^ ( Q t £ x ) , SVx s t anda r d de v i a t i on t=10 forwardt=10 backwardt=1 forwardt=1 backward −2 −1 0 1 2 . . . . . . P^ ( Q t £ x ) , SVx R M SE r a t i o t=10t=1 Figure 1:
Performance of the forward and the backward estimators: bias (left), standard deviation (middle),ratio of root mean square errors (right) with respect to the pre-asymptotic values in the forward estimator, forGARCH(1,1) model (top) and SV model (bottom).
The right column, which shows the RMSE of the backward estimator divided by the RMSEof the forward estimators (both with respect to the pre-asymptotic values of the spectral tailprocess in the forward representation), shows that the backward estimator outperforms the forwardestimator if x is sufficiently large in absolute value. This phenomenon was also observed at otherlags (not shown). For some other models, however, such as certain stochastic recurrence equationor copula Markov models, the advantage of the backward estimator was observed only for smallerlags ( t = 1 , . . . , We asses the performance of the two bootstrap schemes, the stationary bootstrap and the mul-tiplier block bootstrap. To do so, we estimate the coverage probability of the bootstrappedconfidence intervals with respect to the true pre-asymptotic spectral tail process in the forwardrepresentation. We focus on probabilities of the form P[ | Θ t | > | X t | being larger than | X | conditionally on | X | exceeding some threshold al-ready. The true pre-asymptotic values in the forward representation were calculated numericallyvia 10 000 Monte Carlo simulations with time series of length 10 000.In Figure 2, we plot the results for the GARCH(1 ,
1) model and for the SV model, for theforward and the backward estimators. The expected block size for the stationary bootstrap (rep-resented by gray lines) was chosen as 100. For the multiplier block bootstrap (black lines), theblock size was fixed at 100 and the multiplier variables ξ j were drawn independently from thestandard normal distribution. Estimates of the coverage probabilities are based on 1 000 simula-tions. In each such sample, we use 1 000 bootstrap samples for calculating the confidence intervalswith nominal coverage probability 95%. We use two different thresholds, i.e., the 95% and 98%empirical quantiles of the absolute values of a time series of length n = 2 000. For the higher11
20 −10 0 10 20 . . . . . . . . Coverage − forward − GARCH tmultiplier stationary −20 −10 0 10 20 . . . . . . . . Coverage − backward − GARCH tmultiplier stationary−20 −10 0 10 20 . . . . . . . . Coverage − forward − GARCH tmultiplierstationary multiplier − rescaledstationary − rescaled −20 −10 0 10 20 . . . . . . . . Coverage − backward − GARCH tmultiplierstationary multiplier − rescaledstationary − rescaled−20 −10 0 10 20 . . . . . . . . Coverage − forward − SV tmultiplierstationary multiplier − rescaledstationary − rescaled −20 −10 0 10 20 . . . . . . . . Coverage − backward − SV tmultiplierstationary multiplier − rescaledstationary − rescaled
Figure 2:
Coverage probabilities of confidence intervals for P[ | X t /X | > | | X | > u n ] (left: forward estimator;right: backward estimator) based on the stationary bootstrap (gray) and the multiplier block bootstrap (black).The top and the middle plots correspond to the GARCH(1 ,
1) model with thresholds set at the 95% and the 98%empirical quantiles, respectively. The bottom plots correspond to the SV simulation study with threshold set atthe 98% empirical quantile. In the latter two cases, the dashed lines correspond to the coverage probabilities of therescaled confidence intervals (3.8). The horizontal black line is the 0 .
95 reference line.
20 −10 0 10 20 . . . . . Coverage − stationary tblock 5block 10 block 100block 250 −20 −10 0 10 20 . . . . . Coverage − multiplier tblock 5block 10 block 100block 250−20 −10 0 10 20 . . . . . . Median CI width − stationary tblock 5block 10 block 100block 250 −20 −10 0 10 20 . . . . . . Median CI width − multiplier tblock 5block 10 block 100block 250
Figure 3:
Coverage probabilities (top) and median widths (bottom) of confidence intervals forP[ | X t /X | > | | X | > u n ] based on the stationary bootstrap (left column) and the multiplier block bootstrap(right column) for different block lengths. The dash–dotted, dashed, solid, and dotted lines represent (mean) blocklengths 5, 10, 100, and 250 respectively. The plots correspond to the backward estimator and the GARCH(1 , . threshold, the confidence intervals were calculated either directly (indicated by the solid lines) orusing a rescaled bootstrap estimator that was based on the exceedances over the 95% empiricalquantile as in (3.8) (dashed lines).In all cases, the multiplier block bootstrap produces a better coverage probability than thestationary bootstrap. Moreover, the backward estimator is more stable than the forward one, atleast for x = 1, and this translates into higher stability of the bootstrapped confidence intervals.The effect is especially visible for higher thresholds, e.g., at the 98% quantile, leaving insufficientlymany pairs of exceedances for accurate inference. Finally, rescaled confidence intervals (3.8) basedon lower thresholds can have a much better coverage than confidence intervals based on higherthresholds.In addition, in Figure 3 we show coverage probabilities and median confidence intervals widthsfor different block sizes. The multiplier block bootstrap is more robust to the choice of blocklength than the stationary bootstrap. In contrast to the stationary bootstrap, the multiplierblock bootstrap produces confidence intervals whose coverage probabilities are fairly stable acrossdifferent lags for a given block length.It is important not to set the block length too low since it can lead to poor coverage probabili-ties, especially for higher lags. On the other hand, too large a block length can result in confidence13 α β δ γ S&P500 GARCH 7 × − .
062 0 .
932 - -(2 × − ) (0 . . × − .
056 0 .
937 1 .
227 0 . × − ) (0 . . . . × − .
04 0 .
957 - -(2 × − ) (0 . . × − .
056 0 .
951 0 .
938 0 . × − ) (0 . . . . Parameters of the models fitted to daily log-returns of the S&P500 index (top) and the P&G stock price(bottom). Standard errors in parentheses. intervals that are too wide.
We first consider daily log-returns on the S&P500 stock market index between 1990-01-01 and2010-01-01 taken from Yahoo Finance . In Figure 4, we plot the sample spectral tail processprobabilities P[ | Θ t | > | Θ = ±
1] and P[ ± Θ t > | Θ = ±
1] based on the backward estimatorwith 98% empirical quantile taken as a threshold and the 80% pointwise confidence intervalsfrom the multiplier bootstrap scheme rescaled via the 95% quantile as threshold as in (3.8). Theestimated index of regular variation is ˆ α = 3 . X t = σ t Z t : first, the GARCH(1 , σ t = ω + α X t − + β σ t − , and second, the APARCH(1,1) process (Ding et al., 1993) with σ δt = ω + α ( | X t − | − γ X t − ) δ + β σ δt − . Both models allow for volatility clustering in the limit. Additionally, the APARCH model capturesasymmetry in the volatility of returns. That is, volatility tends to increase more when returnsare negative, as compared to positive returns of the same magnitude if γ >
0. The asymmetricresponse of volatility to positive and negative shocks is well known in the finance literature as the leverage effect of the stock market returns (Black, 1976).We fit those two models to daily log-returns of the S&P500 index. We use the garchFit functionfrom the fGarch library available in R , the function being based on maximum likelihood estimation(Wuertz et al., 2013). The innovations, Z t , are assumed to be standard normally distributed. Thefitted parameters are given in the top part of Table 1.In Figure 4 we plot the pre-asymptotic spectral tail process probabilities based on the forwardestimator for the fitted GARCH and APARCH models, together with the sample spectral tail http://finance.yahoo.com/
10 20 30 40 50 . . . . . . . P^ ( | Q t | > | Q = ) tS&Pindependence GARCHAPARCH 0 10 20 30 40 50 . . . . . . . P^ ( | Q t | > | Q = - ) tS&Pindependence GARCHAPARCH0 10 20 30 40 50 . . . . . . . P^ ( Q t > | Q = ) tS&Pindependence GARCHAPARCH 0 10 20 30 40 50 . . . . . . . P^ ( Q t > | Q = - ) tS&Pindependence GARCHAPARCH0 10 20 30 40 50 . . . . . . . P^ ( Q t < - | Q = ) tS&Pindependence GARCHAPARCH 0 10 20 30 40 50 . . . . . . . P^ ( Q t < - | Q = - ) tS&Pindependence GARCHAPARCH Figure 4:
Sample spectral tail process probabilities (solid black bold line) for the S&P500 daily log-returns basedon the backward estimator and the pre-asymptotic spectral tail process probabilities of the fitted GARCH(1,1)(dotted line) and APARCH(1,1) (dashed line) models. The gray area corresponds to the 80% pointwise confidenceintervals for the pre-asymptotic spectral tail probabilities based on the multiplier bootstrap with 1 000 replications.The top, middle and bottom rows concern the conditional probabilities that | Θ t | >
1, Θ t > t < − = 1 (left column) and Θ = − α = 3 .
3. We fit the GARCH(1,1) and APARCH(1,1)models to the time series and show the estimated parameters in the bottom part of Table 1.In Figure 5 we plot the sample spectral tail process probabilities based on the daily log-returnsthemselves and on the residuals of the fitted GARCH(1,1) and APARCH(1,1) models obtainedby the backward estimator. The top-right plot indicates that there is significant serial extremaldependence in the P&G daily log-returns triggered by the negative shocks. Due to high asymmetryin volatility, this feature is still present in the residuals of the fitted GARCH model whereas it isbetter removed by the APARCH filter.
Proof of Lemma 2.1.
To prove (2.3), apply the time-change formula (1.5) with s = t = 0, i = − h ,and f ( y ) = ( y ≤ x ) − (0 ≤ x ) to see thatP[Θ h ≤ x ] − (0 ≤ x ) = E[ | Θ − h | α (Θ / | Θ − h | ≤ x )] − (0 ≤ x ) E[ | Θ − h | α ] . For x ≥ s = − h , t = 0, i = − h and f ( y − h , . . . , y ) = ( y > x, y − h = 1) to getP[Θ h > x, Θ = 1] = E[ | Θ − h | α (Θ / | Θ − h | > x, Θ − h > (cid:2) Θ α − h (1 / Θ − h > x, Θ = 1) (cid:3) , whereas for x <
0, take f ( y − h , . . . , y ) = ( y ≤ x, y − h = 1) to obtainP[Θ h ≤ x, Θ = 1] = E (cid:2) Θ α − h ( − / Θ − h ≤ x, Θ − h > , Θ = − (cid:3) . Similarly, in (2.5) choose f ( y − h , . . . , y ) = ( y > x, y − h = −
1) and f ( y − h , . . . , y ) = ( y ≤ x, y − h = −
1) for x ≥ x <
0, respectively.Next we turn to the asymptotic normality of the forward and backward estimators. Recall thedefinition of X n,i in (3.1). Consider the empirical process˜ Z n ( ψ ) := ( nv n ) − / n (cid:88) i =1 (cid:0) ψ ( X n,i ) − E[ ψ ( X n,i )] (cid:1) , (6.1)where ψ is one of the following functions: φ (cid:0) y − ˜ t , . . . , y , . . . , y ˜ t (cid:1) = log + ( y ) ,φ (cid:0) y − ˜ t , . . . , y , . . . , y ˜ t (cid:1) = ( y > ,φ t ,x (cid:0) y − ˜ t , . . . , y , . . . , y ˜ t (cid:1) = ( y t /y > x, y > ,φ t ,x (cid:0) y − ˜ t , . . . , y , . . . , y ˜ t (cid:1) = ( y − t /y ) α ( y /y − t > x, y >
1) (6.2)for | t | ∈ { , . . . , ˜ t } and x ≥
0. The asymptotic behavior of ˜ Z n can be derived from more generalresults by Drees and Rootz´en (2010). Proposition 6.1.
Let ( X t ) t ∈ Z be a non-negative, stationary, regularly varying time series withtail process ( Y t ) t ∈ Z . Assume that conditions (A( x )), (B) and (C) are fulfilled for some x ≥ .Then, for all y ∈ [ x , ∞ ) ∩ (0 , ∞ ) , the sequence of processes (cid:18) ˜ Z n ( φ ) , ˜ Z n ( φ ) , (cid:104) ( ˜ Z n ( φ t ,x )) x ∈ [ x , ∞ ) , ( ˜ Z n ( φ t ,y )) y ∈ [ y , ∞ ) (cid:105) | t |∈{ ,..., ˜ t } (cid:19)
10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = ) tP&Gupper CL for independence 0 10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = - ) tP&Gupper CL for independence0 10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = ) tGARCH residualsupper CL for independence 0 10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = - ) tGARCH residualsupper CL for independence0 10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = ) tAPARCH residualsupper CL for independence 0 10 20 30 40 50 . . . . . P^ ( | Q t | > | Q = - ) tAPARCH residualsupper CL for independence Figure 5:
Sample spectral tail process (black line) for P&G daily log-returns (top), GARCH(1,1) residuals(middle), and APARCH(1,1) residuals (bottom) based on the backward estimator. Plots in the first columnrepresent conditioning on a positive shock whereas in the second column one conditions on a negative shock. Thehorizontal gray lines correspond to the empirical 80% quantile of the backward estimator under independenceobtained from 10 000 simulations. onverges weakly to a centered Gaussian process Z with covariance function given by cov ( Z ( ψ ) , Z ( ψ )) = ∞ (cid:88) j = −∞ E (cid:2) ψ ( Y − ˜ t , . . . , Y ˜ t ) ψ ( Y j − ˜ t , . . . , Y j +˜ t ) (cid:3) =: c ( ψ , ψ ) (6.3) for all ψ , ψ ∈ (cid:8) φ , φ , φ t ,x , φ t ,y | x ≥ x , y ≥ y , | t | ∈ { , . . . , ˜ t } (cid:9) . The weak convergence statements in Proposition 6.1 hold in the space of bounded functionson (cid:8) φ , φ , φ t ,x , φ t ,y | x ≥ x , y ≥ y , | t | ∈ { , . . . , ˜ t } (cid:9) equipped with the supremum norm; seevan der Vaart and Wellner (1996, Section 1.5) for details. Proof of Proposition 6.1.
One can argue similarly as in the proof of Proposition B.1 of Drees et al.(2015), because the asymptotic equicontinuity of the process can be established for each t sepa-rately. Note that the discussion in Drees and Rootz´en (2016) shows that part (ii) of condition (B)of Drees et al. (2015) is not needed.By stationarity, the covariance of Z ( ψ ) and Z ( ψ ) is obtained as the limit of1 r n v n E r n (cid:88) i =1 ψ ( X n,i ) r n (cid:88) j =1 ψ ( X n,j ) = 1 v n r n − (cid:88) k = − r n +1 (cid:16) − | k | r n (cid:17) E[ ψ ( X n, ) ψ ( X n,k )] . This sum can be shown to converge to c ( ψ , ψ ) using Pratt’s lemma and Condition (C), as inDrees et al. (2015). Remark . The covariances can be expressed in terms of the spectral tail process. For example, c ( φ t ,x , φ ) = ∞ (cid:88) j = −∞ E (cid:2) Θ α − t (1 / Θ − t > x ) log + ( Y Θ j ) (cid:3) = ∞ (cid:88) j = −∞ E (cid:2) Θ α − t (1 / Θ − t > x ) (cid:0) Θ αj ∧ (cid:1)(cid:0) log + Θ j + α − (cid:1)(cid:3) . Here we have used that Y is independent of (Θ s ) s ∈ Z with distribution P[ Y > y ] = y − α for y ≥ Z n hasthe same asymptotic behavior as ˜ Z n : Z n,ξ ( ψ ) := ( nv n ) − / m n (cid:88) j =1 ξ j (cid:88) i ∈ I j (cid:0) ψ ( X n,i ) − E[ ψ ( X n,i )] (cid:1) , (6.4)with I j := { ( j − r n + 1 , . . . , jr n } and m n := (cid:98) n/r n (cid:99) . In what follows, the symbol E ξ denotesthe expectation w.r.t. ξ = ( ξ j ) j ∈ N , i.e., the expectation conditionally on ( X n,i ) ≤ i ≤ n . More-over, let BL denote the set of all functions g : R t +2 → R such that sup z ∈ R t +2 | g ( z ) | ≤ | g ( z ) − g ( z ) | ≤ (cid:107) z − z (cid:107) for all z , z ∈ R t +2 . Proposition 6.3.
Suppose that ( X t ) t ∈ Z is a non-negative, stationary, regularly varying time seriesand that the conditions (A( x )), (B) and (C) are fulfilled for some x ≥ . Then, for all x ≥ x and all y ∈ [ x , ∞ ) ∩ (0 , ∞ ) , one has (cid:16) Z n,ξ ( φ ) , Z n,ξ ( φ ) , (cid:2) Z n,ξ ( φ t ,x ) , Z n,ξ ( φ t ,y ) (cid:3) | t |∈{ ,..., ˜ t } (cid:17) d −→ (cid:16) Z ( φ ) , Z ( φ ) , (cid:2) Z ( φ t ,x ) , Z ( φ t ,y ) (cid:3) | t |∈{ ,..., ˜ t } (cid:17) ith Z as defined in Theorem 3.1. Moreover, sup g ∈ BL (cid:12)(cid:12)(cid:12)(cid:12) E ξ g (cid:16) Z n,ξ ( φ ) , Z n,ξ ( φ ) , (cid:2) Z n,ξ ( φ t ,x ) , Z n,ξ ( φ t ,y ) (cid:3) | t |∈{ ,..., ˜ t } (cid:17) − E g (cid:16) Z ( φ ) , Z ( φ ) , (cid:2) Z ( φ t ,x ) , Z ( φ t ,y ) (cid:3) | t |∈{ ,..., ˜ t } (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) → in probability. Proposition 6.3 follows immediately from Drees (2015, Theorem 2.1), because in the proofof Proposition 6.1 (cf. the proof of Proposition B.1 of Drees et al. (2015)) it is shown that theassumptions of Drees (2015, Theorem 2.1) follow from the conditions of Proposition 6.3.Now we are ready to prove the consistency of the multiplier block bootstrap procedure.
Proof of Theorem 3.3.
We only prove consistency of the bootstrap version of the backward esti-mator, as the proof for the forward estimator is considerably simpler. For simplicity, we assumethat n = m n r n . Let α n := 1E (cid:2) log + ( X /u n ) | X > u n (cid:3) = v n E[ φ ( X n, )] . Recall ˜ Z n and Z n,ξ in (6.1) and (6.4) respectively, recall I j = { ( j − r n + 1 , . . . , jr n } , and recallˆ α n and ˆ α ∗ n in (2.7) and (2.9), respectively. Then( nv n ) / (ˆ α ∗ n − ˆ α n ) = ( nv n ) / (cid:80) m n j =1 ξ j (cid:80) i ∈ I j ( X i > u n ) − ˆ α n (cid:80) m n j =1 ξ j (cid:80) i ∈ I j log + ( X i /u n ) (cid:80) m n j =1 (1 + ξ j ) (cid:80) i ∈ I j log + ( X i /u n )= Z n,ξ ( φ ) − ˆ α n Z n,ξ ( φ ) + ( r n v n ) / m − / n (cid:80) m n j =1 ξ j (1 − ˆ α n /α n ) α − n (1 + m − n (cid:80) m n j =1 ξ j ) + ( nv n ) − / { ˜ Z n ( φ ) + Z n,ξ ( φ ) } . Since m − / n (cid:80) m n j =1 ξ j and ˜ Z n are stochastically bounded and ˆ α n → α in probability, the assump-tions nv n → ∞ , r n v n →
0, and α n → α , m − / n and Proposition 6.3 ensure that( nv n ) / (ˆ α ∗ n − ˆ α n ) = αZ n,ξ ( φ ) − α Z n,ξ ( φ ) + o P (1) , (6.6)which converges weakly to αZ ( φ ) − α Z ( φ ). Moreover, conditionally on the data, it convergesto the same limit weakly in probability in the sense of (6.5).Next, recall ˆ F (b , Θ t ) n ( y ) and ˆ F ∗ (b , Θ t ) n ( y ) in (2.6) and (2.8), respectively. For y >
0, we have (cid:16) − ˆ F (b , Θ t ) n ( y ) (cid:17) n (cid:88) i =1 ( X i > u n ) = n (cid:88) i =1 ( X i − t /X i ) ˆ α n ( X i /X i − t > y, X i > u n ) . It follows thatˆ F (b , Θ t ) n ( y ) − ˆ F ∗ (b , Θ t ) n ( y )= (cid:34) n (cid:88) i =1 (cid:32)(cid:18) X i − t X i (cid:19) ˆ α ∗ n − (cid:18) X i − t X i (cid:19) ˆ α n (cid:33) ( X i /X i − t > y, X i > u n )+ m n (cid:88) j =1 ξ j (cid:88) i ∈ I j (cid:18) X i − t X i (cid:19) ˆ α ∗ n ( X i /X i − t > y, X i > u n ) − { − ˆ F (b , Θ t ) n ( y ) } m n (cid:88) j =1 ξ j (cid:88) i ∈ I j ( X i > u n ) (cid:35)(cid:30) m n (cid:88) j =1 (1 + ξ j ) (cid:88) i ∈ I j ( X i > u n ) . α, α ) such that 0 < α < α < α , there exists a constant 0 < C < ∞ such that for all˜ α ∈ [ α, α ] and, for suitable constants λ = λ (˜ α ) ∈ (0 , { X i /X i − t > y } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) X i − t X i (cid:19) ˜ α − (cid:18) X i − t X i (cid:19) α − (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) (˜ α − α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 (cid:18) X i − t X i (cid:19) α + λ (˜ α − α ) log (cid:18) X i − t X i (cid:19) (˜ α − α ) ≤ C (˜ α − α ) . Henceˆ F (b , Θ t ) n ( y ) − ˆ F ∗ (b , Θ t ) n ( y )= (cid:34) n (cid:88) i =1 (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) (ˆ α ∗ n − ˆ α n ) ( X i /X i − t > y, X i > u n )+ m n (cid:88) j =1 ξ j (cid:88) i ∈ I j (cid:26)(cid:18) X i − t X i (cid:19) α + (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) (ˆ α ∗ n − α ) (cid:27) ( X i /X i − t > y, X i > u n ) − (1 − ˆ F (b , Θ t ) n ( y )) m n (cid:88) j =1 ξ j (cid:88) i ∈ I j ( X i > u n ) + R n ( y ) (cid:35)(cid:30) m n (cid:88) j =1 (1 + ξ j ) (cid:88) i ∈ I j ( X i > u n ) (6.7)with | R n ( y ) | ≤ C (ˆ α ∗ n − ˆ α n ) n (cid:88) i =1 ( X i /X i − t > y, X i > u n )+ C (ˆ α ∗ n − α ) m n (cid:88) j =1 | ξ j | (cid:88) i ∈ I j ( X i /X i − t > y, X i > u n )= O P (cid:0) ( nv n ) − nv n + ( nv n ) − m n r n v n (cid:1) = O P (1) , n → ∞ . Consider the function φ t ,x (cid:0) y − ˜ t , . . . , y , . . . , y ˜ t (cid:1) = ( y − t /y ) α log ( y − t /y ) ( y /y − t > x, y − t > , y > . One may show as in the proof of Proposition 6.1 that ˜ Z n ( φ t ,y ) and Z n,ξ ( φ t ,y ) both convergeweakly to Z ( φ t ,y ). In particular, as n → ∞ ,( nv n ) − n (cid:88) i =1 (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) ( X i /X i − t > y, X i > u n )= E (cid:20)(cid:18) X − t X (cid:19) α log (cid:18) X − t X (cid:19) ( X /X − t > y ) (cid:12)(cid:12)(cid:12) X > u n (cid:21) + O P (cid:0) ( nv n ) − / (cid:1) → E (cid:2) Θ α − t log(Θ − t ) (1 / Θ − t > y ) (cid:3) = − E[log(Θ t ) (Θ t > y )] , where the last step follows from the time-change formula (1.5) applied with f ( y ) = − log( y ) ( y >y ) and ( − t, , − t ) instead of ( s, t, i ). Therefore n (cid:88) i =1 (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) (ˆ α ∗ n − ˆ α n ) ( X i /X i − t > y, X i > u n )= − ( nv n ) / (cid:0) E[log(Θ t ) (Θ t > y )] + o P (1) (cid:1) ( nv n ) / (ˆ α ∗ n − ˆ α n ) . (6.8)Likewise, one can conclude that( nv n ) − / m n (cid:88) j =1 ξ j (cid:88) i ∈ I j (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) ( X i /X i − t > y, X i > u n )20 Z n,ξ ( φ t ,y ) + ( rv n ) / m − / n m n (cid:88) j =1 ξ j E (cid:20)(cid:16) X − t X (cid:17) α log (cid:18) X − t X (cid:19) ( X /X − t > y ) (cid:12)(cid:12)(cid:12) X > u n (cid:21) = O P (1) . As a consequence, m n (cid:88) j =1 ξ j (cid:88) i ∈ I j (cid:26)(cid:18) X i − t X i (cid:19) α + (cid:18) X i − t X i (cid:19) α log (cid:18) X i − t X i (cid:19) (ˆ α ∗ n − α ) (cid:27) ( X i /X i − t > y, X i > u n )= ( nv n ) / Z n,ξ ( φ t ,y ) + m n (cid:88) j =1 ξ j r n v n (cid:0) E (cid:2) Θ α − t (1 / Θ − t > y ) (cid:3) + o (1) (cid:1) + O P (1)= ( nv n ) / (cid:16) Z n,ξ ( φ t ,y ) + O P (cid:0) ( r n v n ) / (cid:1) + O P (cid:0) ( nv n ) − / (cid:1)(cid:17) . (6.9)Moreover, we find, as r n v n → (cid:80) m n j =1 ξ j = O P ( m / n ), that m n (cid:88) j =1 ξ j (cid:88) i ∈ I j ( X i > u n ) = ( nv n ) / Z n,ξ ( φ ) + r n v n m n (cid:88) j =1 ξ j = ( nv n ) / (cid:0) Z n,ξ ( φ ) + o P (1) (cid:1) , n → ∞ . (6.10)The denominator of (6.7) equals nv n + O P (( nv n ) / ). Combining (6.7)–(6.10) and (6.6) yields( nv n ) / (cid:0) ˆ F (b , Θ t ) n ( y ) − ˆ F ∗ (b , Θ t ) n ( y ) (cid:1) = − E[log(Θ t ) (Θ t > y )] ( nv n ) / (ˆ α ∗ n − ˆ α n ) + Z n,ξ ( φ t ,y ) − (1 − ˆ F (b , Θ t ) n ( y )) Z n,ξ ( φ ) + o P (1)= Z n,ξ ( φ t ,y ) − ¯ F (Θ t ) ( y ) Z n,ξ ( φ ) − E[log(Θ t ) (Θ t > y )] (cid:0) αZ n,ξ ( φ ) − α Z n,ξ ( φ ) (cid:1) + o P (1) . Now the assertion is a direct consequence of Proposition 6.3 and Theorem 3.1.
Acknowledgements
The authors wish to thank the editors and the referees for their careful reading and for variousconstructive comments and useful suggestions. J. Segers gratefully acknowledges funding by con-tract “Projet d’Actions de Recherche Concert´ees” No. 12/17-045 of the “Communaut´e fran¸caise deBelgique” and by IAP research network Grant P7/06 of the Belgian government (Belgian SciencePolicy). The research of M. Warcho(cid:32)l was funded by a PhD grant of the “Fonds de la RechercheScientifique” (F.R.S.-FNRS). H. Drees was partially supported by DFG research grant JA 2160/1.R. Davis was supported in part by ARO MURI grant W911NF-12-1-0385.
References
Basrak, B. and J. Segers (2009). Regularly varying multivariate time series.
Stochastic Processes andTheir Applications 119 (4), 1055–1080.Black, F. (1976). Studies of stock price volatility changes. In
Proceedings of the 1976 Meetings of theAmerican Statistical Association, Business and Economics Section , pp. 177–181.Bollerslev, T., V. Todorov, and S. Z. Li (2013). Jump tails, extreme dependencies, and the distributionof stock returns.
Journal of Econometrics 172 (2), 307–324.Bollerslev, T., V. Todorov, and L. Xu (2015). Tail risk premia and return predictability.
Journal ofFinancial Economics 118 (1), 113–134.Davis, R. A. and T. Mikosch (2001). Point process convergence of stochastic volatility processes withapplication to sample autocorrelation.
Journal of Applied Probability 38A , 93–104. avis, R. A. and T. Mikosch (2009). The extremogram: a correlogram for extreme events. Bernoulli 15 (4),977–1009.Davis, R. A., T. Mikosch, and I. Cribben (2012). Towards estimating extremal serial dependence via thebootstrapped extremogram.
Journal of Econometrics 170 (1), 142–152.Ding, Z., C. W. Granger, and R. F. Engle (1993). A long memory property of stock market returns anda new model.
Journal of Empirical Finance 1 (1), 83–106.Drees, H. (2000). Weighted approximations of tail processes for β -mixing random variables. The Annalsof Applied Probability 10 (4), 1274–1301.Drees, H. (2003). Extreme quantile estimation for dependent data with applications to finance.
Bernoulli 9 (4), 617–657.Drees, H. (2015). Bootstrapping empirical processes of cluster functionals with application to ex-tremograms. arXiv preprint arXiv:1511.00420 .Drees, H. and H. Rootz´en (2010). Limit theorems for empirical processes of cluster functionals.
TheAnnals of Statistics 38 (4), 2145–2186.Drees, H. and H. Rootz´en (2016). Correction note to “limit theorems for empirical processes of clusterfunctionals”.
The Annals of Statistics (to appear).Drees, H., J. Segers, and M. Warcho(cid:32)l (2015). Statistics for tail processes of Markov chains.
Extremes 18 (3),369–402.Han, H., O. Linton, T. Oka, and Y.-J. Whang (2016). The cross-quantilogram: Measuring quantiledependence and testing directional predictability between time series.
Journal of Econometrics 193 (1),251–270.Leadbetter, M. (1983). Extremes and local dependence in stationary sequences.
Zeitschrift f¨ur Wahrschein-lichkeitstheorie und Verwandte Gebiete 65 (2), 291–306.Linton, O. and Y.-J. Whang (2007). The quantilogram: With an application to evaluating directionalpredictability.
Journal of Econometrics 141 (1), 250–282.Mikosch, T. and C. St˘aric˘a (2000). Limit theory for the sample autocorrelations and extremes of a GARCH(1,1) process.
The Annals of Statistics 28 (5), 1427–1451.Politis, D. N. and J. P. Romano (1994). The stationary bootstrap.
Journal of the American Statisticalassociation 89 (428), 1303–1313.Tjøstheim, D. and K. O. Hufthammer (2013). Local gaussian correlation: a new measure of dependence.
Journal of Econometrics 172 (1), 33–48.van der Vaart, A. W. and J. A. Wellner (1996).
Weak Convergence of Empirical Processes . New York:Springer.Wuertz, D., Y. C. with contribution from Michal Miklovic, C. Boudt, P. Chausse, and others (2013). fGarch: Rmetrics - Autoregressive Conditional Heteroskedastic Modelling . R package version 3010.82.. R package version 3010.82.