[PDF] Nonparametric Expected Shortfall Forecasting Incorporating Weighted Quantiles

Abstract

A new semi-parametric Expected Shortfall (ES) estimation and forecasting framework is proposed. The proposed approach is based on a two-step estimation procedure. The first step involves the estimation of Value-at-Risk (VaR) at different quantile levels through a set of quantile time series regressions. Then, the ES is computed as a weighted average of the estimated quantiles. The quantiles weighting structure is parsimoniously parameterized by means of a Beta weight function whose coefficients are optimized by minimizing a joint VaR and ES loss function of the Fissler-Ziegel class. The properties of the proposed approach are first evaluated with an extensive simulation study using two data generating processes. Two forecasting studies with different out-of-sample sizes are then conducted, one of which focuses on the 2008 Global Financial Crisis (GFC) period. The proposed models are applied to 7 stock market indices and their forecasting performances are compared to those of a range of parametric, non-parametric and semi-parametric models, including GARCH, Conditional AutoRegressive Expectile (CARE), joint VaR and ES quantile regression models and simple average of quantiles. The results of the forecasting experiments provide clear evidence in support of proposed models.

Full PDF

NNonparametric Expected Shortfall ForecastingIncorporating Weighted Quantiles

Giuseppe Storti , Chao Wang Department of Economics and Statistics, University of Salerno Discipline of Business Analytics, The University of Sydney

Abstract

A new semi-parametric Expected Shortfall (ES) estimation and forecasting frame-work is proposed. The proposed approach is based on a two-step estimation pro-cedure. The ﬁrst step involves the estimation of Value-at-Risk (VaR) at diﬀerentlevels through a set of quantile time series regressions. Then, the ES is computedas a weighted average of the estimated quantiles. The quantiles weighting structureis parsimoniously parameterized by means of a Beta function whose coeﬃcients areoptimized by minimizing a joint VaR and ES loss function of the Fissler-Ziegel class.The properties of the proposed approach are ﬁrst evaluated with an extensive simu-lation study using various data generating processes. Two forecasting studies withdiﬀerent out-of-sample sizes are conducted, one of which focuses on the 2008 GlobalFinancial Crisis (GFC) period. The proposed models are applied to 7 stock mar-ket indices and their forecasting performances are compared to those of a range ofparametric, non-parametric and semi-parametric models, including GARCH, Con-ditional AutoRegressive Expectile (CARE, Taylor 2008), joint VaR and ES quantileregression models (Taylor, 2019) and simple average of quantiles. The results of theforecasting experiments provide clear evidence in support of the proposed models.

Keywords : Value-at-Risk, Expected Shortfall, quantile regression, Beta function,joint loss. a r X i v : . [ q -f i n . R M ] M a y INTRODUCTION

Value-at-Risk (VaR) is employed by many ﬁnancial institutions as an important riskmanagement tool. Representing the market risk as one number, VaR has been usedas a standard risk measurement metric for the past two decades. However, as recentlyrecognized by the Basel Committee for Banking Supervision, VaR suﬀers from a numberof weaknesses aﬀecting its reliability as a reference metric for determining regulatorycapital requirements (Basel Committee on Banking Supervision, 2013). First, VaR cannotmeasure the expected loss for extreme (violating) returns. In addition, it can be shownthat VaR is not always a coherent risk measure, due to failure to match the subadditivity property. For these reasons, the Committee proposed in May 2012 to replace VaR withthe Expected Shortfall (ES, Artzner 1997; Artzner et al. 1999). Thus, in recent years EShas been increasingly employed for tail risk measurement. However, still there is muchless existing research on modeling ES compared with VaR.ES calculates the expected value of return being below the quantile (VaR) of itsdistribution. Diﬀerently from VaR, it is a coherent measure and it ”measures the riskinessof a position by considering both the size and the likelihood of losses above a certainconﬁdence level” (Basel Committee on Banking Supervision, 2013).The Basel III Accord, which was implemented in 2019, places new emphasis on ES.Its recommendations for market risk management are illustrated in the 2019 document“Minimum capital requirements for market risk” that, on page 89, mentions: “ES mustbe computed on a daily basis for the bank-wide internal models to determine market riskcapital requirements. ES must also be computed on a daily basis for each trading desk thatuses the internal models approach (IMA).”; “In calculating ES, a bank must use a 97.5thpercentile, one-tailed conﬁdence level” (Basel Committee on Banking Supervision, 2019).Therefore, in the empirical application of our paper, we mainly focus on one-step-aheadtail risk forecasting at 2.5% quantile level.The literature on ES modelling and forecasting is closely related to previous researchon VaR. The quantile regression type model, e.g. the Conditional Autoregressive Value-1t-Risk (CAViaR) model of Engle and Manganelli (2004), is a popular semi-parametricapproach to forecast VaR. Gerlach et al. (2011) generalize the CAViaR models to a fullynonlinear family.However, the CAViaR type models cannot directly estimate and forecast ES. A semi-parametric model that directly estimates quantiles and expectiles, and implicitly ES,called the Conditional Autoregressive Expectile (CARE) model, is proposed by Taylor(2008). To select the appropriate expectile level, a grid search process is required forthe CARE type models which is relatively computationally expensive (dependent on themodel complexity and the size of the grid).Taylor (2019) proposes a joint ES and quantile regression framework (ES-CAViaR)which employs the Asymmetric Laplace (AL) density to build a likelihood function whoseMaximum Likelihood Estimates (MLEs) coincide with those obtained by minimisation ofa strictly consistent joint loss function for VaR and ES. The frameworks in Taylor (2019)assume that the diﬀerence or ratio between VaR and ES follow speciﬁc dynamics, alsoin order to guarantee that VaR and ES do not cross with each other. Essentially, thisimplies additional assumptions on ES dynamics.Fissler and Ziegel (2016) develop a family of joint loss functions (or “scoring rules”)that are strictly consistent for the true VaR and ES, i.e. they are uniquely minimizedby the true VaR and ES series. Under speciﬁc choices of the functions involved in thejoint loss function of Fissler and Ziegel (2016), it can be shown that the negative of ALlog-likelihood function, presented in Taylor (2019), can be derived as a special case ofthe Fissler and Ziegel (2016) class of loss functions. Patton et al. (2019) propose newdynamic models for VaR and ES, through adopting the generalized autoregressive score(GAS) framework (Creal et al. 2013 and Harvey 2013) and utilizing the loss functions inFissler and Ziegel (2016).In our paper, a new ES estimation and forecasting framework is proposed where theES is modelled as an aﬃne function of tail quantiles. Hence, we refer to our approach asthe

Weighted Quantile estimator. The quantiles are produced from the CAViaR modelof Engle and Manganelli (2004), by grid search of a range of equally spaced quantile2evels below the target VaR level, i.e. 2.5%. We will discuss the selection details ofthese quantile levels later. The weighting pattern of the selected quantiles is based ona two parameter Beta function, also called the ”Beta lag”, borrowed from the literatureon Mixed Data Sampling (MIDAS, Ghysels et al. 2007) regression models. The Beta lagfunction is a parsimonious but yet ﬂexible choice and is able to reproduce a variety ofdiﬀerent behaviours such as declining, increasing or hump shaped patterns.We estimate the parameters of the Beta Lag, determining the Beta weights assignedto the selected quantiles, by minimizing strictly consistent VaR and ES joint loss functionsof the class deﬁned in Fissler and Ziegel (2016). In particular we focus on the negativeAL loss.Our framework has some important advantages. First, the proposed estimator doesnot require any additional assumption on the ES process but it only relies on the natu-ral deﬁnition of ES as the tail conditional expectation of the conditional distribution ofreturns . Furthermore, the dynamics of the ES, that can be explicitly derived by stan-dard algebraic manipulations, are those naturally implied by its deﬁnition in terms of theexpectation of tail-quantiles. This implies that the ES can be predicted without havingto specify an additional dynamic equation, so reducing model uncertainty and risk ofpotential mis-speciﬁcation on the ES side.Our method has some interesting connections with the existing literature. First, thereare some evident aﬃnities between our method and CARE models. Namely, both ourframework and CARE models involve a two-step estimation procedure and a grid searchprocess. Later, we will show that our framework can produce more accurate ES forecastingresults than CARE, by using a signiﬁcantly lower number of grid search quantile levels.In the empirical section, we have shown that our framework using grid size of 3 (quantilelevels) can have clearly improved performance compared to CARE with grid size of 50(expectile levels). Further, it is closely related to literature on forecasts combination.Taylor (2020) has recently proposed to use a forecast combination of diﬀerent VaR&ESmodels of the same order. However, our strategy is substantially diﬀerent since we are Under the assumption that this distribution is continuous, as later explained. FRAMEWORK AND MOTIVATION

To start, let I t be the information available at time t and F t ( r ) = P r ( r t ≤ r |I t − )be the Cumulative Distribution Function (CDF) of r t conditional on I t − . We assumethat F t ( . ) is strictly increasing and continuous on the real line (cid:60) . Under this assumption,the one-step-ahead α level Value at Risk at time t can be deﬁned as Q t,α = F − t ( α ) 0 < α < . Within the same framework, the one-step-ahead α level Expected Shortfall can be shown(see Acerbi and Tasche, 2002, among others) to be equal to the tail conditional expectationof r t ES t,α = E ( r t |I t − , r t ≤ Q t,α ) . ES t,α is related to F t ( . ) by the following integral ES t,α = 1 F t ( Q t,α ) (cid:90) Q t,α −∞ rdF t ( r ) = 1 α (cid:90) Q t,α −∞ rdF t ( r ) , (1)that, after a simple change of variable, can be rewritten as ES t,α = 1 α (cid:90) α Q t,p dp. (2)The integral in (2) can be approximated over a discrete grid by means of standard nu-merical integration techniques. Namely, given the target quantile level α , assume that anequally spaced grid of quantile levels of size M is selected, α M = [ α , α , . . . , α M = α ] , where, setting α = 0, α m = α m − + η, with η = ( α M − α ) / ( M −

1) = α/M , for m = 1 , . . . , M . A simple rectangular rule wouldthen lead to the following approximation ES t,α ≈ α η M (cid:88) i =1 Q t,α i = 1 M M (cid:88) i =1 Q t,α i = M (cid:88) i =1 w i Q t,α i (3)with w i = 1 /M , for i = 1 , . . . , M . It is easy to show that, in theory, as M → ∞ theabove approximation asymptotically tends to the “true” ES t,α value. In general, it canbe shown (see Davis and Rabinowitz, 1984) that many higher order integration rules,such as the trapezoidal and Simpson’s rule, can be represented as weighted averages ofthe form in (3) where, modulating the choice of the weights w i , one can obtain diﬀerentintegration rules as special cases. For example, the set of weights ( w = 1 / M, w =1 /M, . . . , w M − = 1 /M, w M = 1 / M ) would lead to a trapezoidal rule (for more detailssee Davis and Rabinowitz, 1984, page 57, Section 2.1.5). It is however worth noting that,in real data applications, data scarcity prevents accurate estimation of VaR for extremequantile orders, posing constraints on the choice of the minimum grid value α .It follows that a correction for this left-tail truncation bias should be considered whendesigning an estimator for ES t,α based on the representation in Equation (2). Further-more, referring to an appropriately deﬁned strictly consistent scoring rule, the weights5ould be estimated rather than ﬁxed a priori. This approach would bring some importantadvantages. First, it would be possible to modulate, in a data-driven fashion, the weightsassigned to each tail-quantile in (3) in order to optimally match the tail properties ofreturns and, eventually, down-weight less accurately estimated extreme quantiles. Sec-ond, it would allow to control the left-tail truncation bias. Last, working with estimatedweights would reduce the impact, to some extent, of the subjective choice of the lowerbound α .In Section 4, starting from a set of consistent estimators of Q t,p (0 < p ≤ α ), theseideas will be elaborated to deﬁne a semi-parametric two-step estimation strategy for ES t,α .Before moving to the illustration of our proposal, in the next section, we present a reviewof the main approaches for joint semi-parametric estimation of conditional VaR and ES.In order to simplify notation, in the remainder, unless diﬀerently speciﬁed, the followingnotational conventions will be adopted: ES t,α ≡ ES t and Q t,α ≡ Q t . JOINT MODELLING OF VaR AND ES: ES-CAViaR AND CAREMODELS

Koenker and Machado (1999) show that the quantile regression estimator is equivalent to amaximum likelihood estimator when assuming that the data are conditionally distributedas an Asymmetric Laplace (AL) with a mode at the quantile of interest. If r t is the returnon day t and P r ( r t < Q t |I t − ) = α , then the parameters in the model for Q t can beestimated maximizing a quasi-likelihood based on: p ( r t |I t − ) = α (1 − α ) σ exp ( − ( r t − Q t )( α − I ( r t < Q t )) , for t = 1 , . . . , n and where σ is a nuisance parameter.Taylor (2019) extends this result to incorporate the associated ES quantity into thelikelihood expression, noting a link between ES t and a dynamic σ t , resulting in the con-6itional density function: p ( r t |I t − ) = α (1 − α ) ES t exp (cid:18) − ( r t − Q t )( α − I ( r t < Q t ) αES t (cid:19) , (4)allowing a likelihood function to be built and maximised, given model expressions for( Q t , ES t ). Taylor (2019) notes that the negative logarithm of the resulting likelihoodfunction is strictly consistent for ( Q t , ES t ) considered jointly, e.g. it ﬁts into the class ofjointly consistent scoring functions for VaR & ES developed by Fissler and Ziegel (2016).Taylor (2019) incorporates two diﬀerent ES components that describe the dynamicsof VaR and ES and also avoid ES estimates crossing the corresponding VaR estimates, aspresented below in Model (5) (ES-CAViaR-Add: ES-CAViaR with an additive VaR to EScomponent) and Model (6) (ES-CAViaR-Mult: ES-CAViaR with a multiplicative VaR toES component): ES-CAViaR-Add: Q t = β + β | r t − | + β Q t − , (5)ES t = Q t − w t ,w t =  γ + γ ( Q t − − r t − ) + γ w t − if r t − ≤ Q t − ,w t − otherwise , where, to ensure that the VaR and ES series do not cross, Taylor (2019) imposes thefollowing constraints: γ ≥ , γ ≥ , γ ≥ ES-CAViaR-Mult: Q t = β + β | r t − | + β Q t − , (6)ES t = (1 + exp( γ )) Q t , where γ is unconstrained. 7 .2 Models based on expectiles The concept of expectile is closely related to the concept of quantile. The τ level expectile µ τ , as deﬁned by Aigner et al. (1976), can be estimated through minimizing the followingAsymmetric Least Squares (ALS) criterion (Taylor, 2008): N (cid:88) t =1 | τ − I ( r t < µ τ ) | ( r t − µ τ ) , (7)no distributional assumption is required to estimate µ τ here.As discussed in Section 1, conditional ES is deﬁned as ES t,α = E ( r t | r t < Q t,α , I t − ).Newey and Powell (1987) and Taylor (2008) show that this is related to the conditional τ level expectile µ t,τ by the relationship:ES t,α τ = (cid:18) τ (1 − τ ) α τ (cid:19) µ t,τ , (8)where µ t,τ = Q t,α τ , i.e. τ level expectile µ t,τ occurs at the quantile level α τ of r t . α τ isused here to emphasize this relationship. Thus, µ t ; τ can be used to estimate the α τ levelconditional quantile Q t,α τ , and then scaled to estimate the associated ES t,α τ .Exploiting this relationship, Taylor (2008) proposes the CARE type models whichhave a similar form to the CAViaR models of Engle and Manganelli (2004), where laggedreturns drive the expectiles and model parameters are estimated minimizing an ALScriterion. The general Symmetric Absolute Value (SAV) form of this model is: CARE-SAV : µ t,τ = β + β | r t − | + β µ t − ,τ where µ t,τ is the τ level expectile on day t . The CARE-type model produces one-step-ahead forecasts of expectiles ( µ t,τ ), that can be employed as VaR estimates ( Q t,α τ ), byan appropriate choice of τ . The VaR estimates can be further scaled, using Equation(8), to produce forecasts of ES which cannot be directly calculated under the CAViaRframework.However, the selection of the appropriate expectile level τ requires a grid search, basedon the optimization of the violation rate (VRate, the percentage of returns exceeding8aR estimates) or of the aggregated quantile loss function (Gerlach and Wang, 2020).Speciﬁcally, in the ﬁrst case, for each grid value of τ , the ALS estimator of the CAREequation parameters β j ( j = 1 , ,

3) is found, yielding an associated VRate( τ ). ˆ τ is thenset to the grid value of τ s.t. VRate is closest to the desired α τ . Diﬀerently, when theaggregated quantile loss is chosen as an objective function, the selected α τ is chosen tominimize over the selected grid the value of the quantile loss wrt to the quantile level,see Gerlach and Wang (2020). In real applications, this grid search approach can becomputationally expensive (dependent on the size of the grid), and the performance canbe aﬀected by the size and gap of the grid which is normally decided by means of anad-hoc approach.Fissler and Ziegel (2016) develop a family of joint loss functions whose value dependson the associated VaR and ES. Members of this family are strictly consistent for ( Q t , ES t ),i.e. their expectations are uniquely minimized by the true VaR and ES series. The generalform of this functional family is: S t ( r t , Q t , ES t ) = ( I t − α ) G ( Q t ) − I t G ( r t ) + G ( ES t ) (cid:18) ES t − Q t + I t α ( Q t − r t ) (cid:19) − H ( ES t ) + a ( r t ) , where I t = 1 if r t < Q t and 0 otherwise, for t = 1 , . . . , N , G ( . ) is increasing, G ( . ) isstrictly increasing and strictly convex, G = H (cid:48) and lim x →−∞ G ( x ) = 0 and a ( · ) is areal-valued integrable function.As discussed in Taylor (2019), assuming r t to have zero mean, making the choices: G ( x ) = 0, G ( x ) = − /x , H ( x ) = − log( − x ) and a = 1 − log(1 − α ), which satisfy therequired criteria, returns the scoring function: S t ( r t , Q t , ES t ) = − log (cid:18) α − t (cid:19) − ( r t − Q t )( α − I ( r t ≤ Q t )) α ES t , (9)where the aggregated loss is indicates as S = (cid:80) Nt =1 S t . Taylor (2019) refers to expression(9) as the AL log score. The negative of Equation (9) then can be treated as the ALlog-likelihood, and is a strictly consistent scoring rule that is jointly minimized by thetrue VaR and ES series. 9 THE WEIGHTED QUANTILE ESTIMATOR

In this section, we illustrate the proposed two-step approach for semi-parametric estima-tion of Expected Shortfall. At step 1, we obtain semi-parametric estimates of VaR over apre-deﬁned set of risk levels ≤ α . Then, at step-2, conditional on the estimates obtainedat step 1 and relying on the representation of ES in Equation (2), an estimate of theconditional ES is obtained as an aﬃne function of the 1st-stage VaR estimates given bya weighted average plus a constant. For these reasons we refer to our approach as theWeighted Quantile estimator.Compared to a simple average, working with an aﬃne transformation oﬀers someimportant advantages. First, this speciﬁcation allows to easily control for left-tail trunca-tion bias. Second, since the weights are ﬁtted via the optimization of a strictly consistentscoring rule for ( Q t , ES t ), it is potentially possible to obtain relevant gains in terms ofaccuracy in the estimation of VaR and ES.Next, we provide a detailed description of the two steps of the proposed estima-tion procedure. Although, for ease of explanation, we focus on the standard risk level α = 2 . α . Step-1 :Under step 1, given the target quantile level α = 2 . M is selected, α M = [ α , α , . . . , α M ] , where α m = α m − + η , with η = ( α − α ) / ( M −

1) and α M = α , for m = 2 , . . . , M . Thevalue of the lower bound α can be selected on a case-by-case basis, mainly taking intoaccount the length of the available in-sample returns series. As an example, with M = 10and α = 2 . α = 0 .

005 we have η = 0 . α M = [0 . , . , . , . , . , . , . , . , . , . . M CAViaR models are separately estimated for each of the quantile orders in α M .For illustrative purposes, without implying any loss of generality, we here refer to theCAViaR symmetric absolute value (CAViaR-SAV) framework Q t = β + β | r t − | + β Q t − . (10)The proposed procedure can however be immediately extended to consider diﬀerent con-ditional quantile models of diﬀerent nature and complexity, such as CAViaR with asym-metric speciﬁcation (CAViaR-AS) or nonlinear threshold speciﬁcation.For each trial quantile level α m ∈ α M , the above CAViaR-SAV model is then used toproduce the time series of conditional in-sample quantiles Q N,α m and quantile forecastsˆ Q N +1 ,α m , for m = 1 , . . . , M . The set of in-sample quantiles at all trial quantile levels,from α to α M , is collected in the N × M matrix Q N . Here, the last ( M -th) column of Q N corresponds to the time series of 2.5% in-sample conditional quantiles: Q N, . .Similarly, we use the notation ˆQ N +1 to indicate the 1 × M vector of 1-step-ahead VaRforecasts at all trial quantile levels. The last element in ˆQ N +1 represents the VaR forecastat the target 2 .

5% level: ˆ Q N +1 , . .Lastly, we would like to emphasize that the proposed framework can actually incor-porate quantile estimates obtained from any model (not necessarily CAViaR), while weleave this for future research. Step-2 :In the second stage of our approach, we predict the conditional ES as an aﬃne functionof the elements of ˆ Q N +1 .More precisely, the in-sample conditional ES at time t is modelled as ES ( wq ) t = w + M (cid:88) i =1 w i Q t,α i , (11)where the weights w i , i = 1 , . . . , M , are generated by some ﬂexible and parsimoniousfunction. A suitable choice is given by the Beta lag function borrowed from the literatureon mixed data sampling and distributed lag models (see Ghysels et al., 2007, among11thers). Namely, for i = 1 , . . . , M , w i = w (cid:0) iM ; a, b (cid:1) with w ( x ; a, b ) = x a − (1 − x ) b − Γ( a + b )Γ( a )Γ( b ) . (12)As previously discussed, the estimation of ES based on numerical integration of thetail quantiles is inherently aﬀected by truncation bias since the summation on the RHSof (11) does not involve conditional quantiles of order below α . In Equation (11) wecontrol for truncation bias in two diﬀerent ways. First, we include the intercept term w in order to control for ﬁxed bias. Second, the sum of weights appearing on the RHS of(11), θ = (cid:80) Mi =1 w i , has been deliberately left unconstrained in order to allow the size ofbias to depend on the average tail VaR level. Remark 1.

It is worth noting that, letting ˜ w i = w i /θ , Equation (11) can be alternativelywritten as ES ( wq ) t = w + θ M (cid:88) i =1 ˜ w i Q t,α i , (13)where (cid:80) Mi =1 ˜ w i = 1 by construction. The reparameterization in (13) makes evident therole of θ for bias correction.The main reasons for adopting the Beta Lag speciﬁcation to model the weights behaviourin Equation (11) are its parsimony, since it only depends on two parameters, and ﬂexibility.Figure 1 displays various patterns that can be generated from the weight structure deﬁnedin Equation (12) for diﬀerent values of the coeﬃcients a and b . To facilitate comparisonamong diﬀerent patterns, the weights in the plots have been normalized so that they sumup to unity ( θ = 1). Constraining a or b to be equal to 1, a zero-modal behaviour isobserved. Namely, for a = 1 and b >

1, the Beta Lag function returns declining weightswhile, choosing a > b = 1, increasing weights are obtained. The value of theunconstrained coeﬃcient determines the speed of decay of the curve. Removing the unityconstraint on either a or b makes the curve more ﬂexible and allows to reproduce uni-modal, hump-shaped behaviors such as those observed in the lower panel of Figure 1.Mixtures of Beta Lag polynomials could also be used to further increase the ﬂexibility of12 (a=1;b=4) (a=1;b=20) (a=4;b=1) (a=20;b=1) (a=3;b=8) (a=6;b=8) (a=8;b=3) (a=8;b=6) Figure 1: The ﬁgure displays various weighting patterns generated by the Beta Lag func-tion for diﬀerent values of the parameters a and b (from left to right and from top tobottom): ( a = 1, b = 4), ( a = 1, b = 20), ( a = 4, b = 1), ( a = 20, b = 1), ( a = 3, b = 8),( a = 6, b = 8), ( a = 8, b = 3), ( a = 8, b = 6).13he curve, as explored in Ghysels et al. (2007).The only unknown parameters in Equation (11) are ( w , a, b ). Conditional on theﬁtted VaR series, these can be estimated minimizing a strictly consistent scoring rulethat is ( ˆ w , ˆ a, ˆ b ) = arg min ( w ,a,b ) N (cid:88) t =1 S t ( r t , ES t ; w , a, b | Q N )where S t ( r t , ES t ; w , a, b | Q t ) = ( I t − α ) G ( Q t ) − I t G ( r t ) + G ( ES t ) (cid:18) ES t − Q t + I t α ( Q t − r t ) (cid:19) − H ( ES t ) + a ( r t ) , and ES t is deﬁned as in (11).One-step-ahead forecasts of ES t can then be easily computed by replacing estimatedin-sample quantiles, on the RHS of (11), by their out-of-sample forecasts obtained from theassociated CAViaR models ( ˆQ N +1 ). Formally, the ES predictor at time N + 1, conditionalon in-sample information available at time N , is obtained as (cid:99) ES N +1 ,α = w ,N + M (cid:88) i =1 w i,N ˆ Q N +1 ,α i , (14)where the subscript N in w i,N indicates that the weight function is estimated using infor-mation up to time N .The number of grid points M is a hyper-parameter that we need to choose. Compre-hensive simulation and empirical studies are conducted on testing the eﬀects of incorpo-rating various M . Overall, M aﬀects the trade oﬀ between accuracy and computationalcost. However, our weighted quantile framework is turned out to be capable of accuratelypredicting the ES using a very small value of M , i.e. M = 3. In the simulation study,to demonstrate the eﬀect of M , we have tested M = 3, M = 5, M = 10 and M = 50respectively.In our empirical investigations, as a robustness check, we compare the performanceof the weighted quantile framework in Equation (11) with a simpler approach replacing14he weighted average with an equally weighted average of quantiles: ES ( avg ) t = w + (cid:80) Mi =1 Q t,α i M . (15)In the empirical section, we found that this framework is however consistently outper-formed by the more complex weighted quantile estimator.

Remark 2.

As presented in Figure 1, the last element of the sequence of weights generatedfrom the Beta Lag function (i.e. the one corresponding to the risk level α M = α ) is byconstruction equal to 0, except when parameter b equals to 1 . To address this issue, inthe implementation, we set the number of grid points equal to M + 1, so that the weight ofthe α M -quantile is not 0 by construction. Referring to the previous example, to estimateES at level α = 2 .

5% as a weighted average of M = 10 tail quantiles of order α i ≤ α ,setting α = 0 . M + 1 = 11 points, with α M = 0 . α M +1 = [0 . , . , . , . , . , . , . , . , . , . , . . At no additional cost in terms of estimated parameters, this simple solution guaranteesthat the estimated weight for quantile at level α M +1 = 0 . α M = 0 .

025 quantile. In this way, consistently with itstheoretical deﬁnition, the ES will be estimated as a weighted average of M quantiles oforder α i ≤ α ( i = 1 , . . . , M ) without systematically excluding the estimated VaR at thetarget level α . Implied ES Dynamics

In this section we investigate the dynamic properties of the estimated conditional ES seriesobtained through the Weighted Quantile estimator. Assuming, for ease of presentation, In this case, it is equal to 1 under the convention 0 = 1. In the practical implementation on bothreal and simulated data, we ﬁnd that estimated b is never exactly 1, thus we have the last weight in BetaLag function always as 0. ES t = w + ¯ β + ¯ β | r t − | + M (cid:88) i =1 w i β ,i Q t − ,α i (16)where ¯ β k = (cid:80) Mi =1 w i β k,i , for k = 0 ,

1, and ( β ,i , β ,i , β ,i ) are the parameters of the CAViaR-SAV model for the conditional α i -quantile of r t , for i = 1 , . . . , M . If the CAViaR-SAVmodel is well speciﬁed, i.e. if log-returns are generated by the following GARCH-typeprocess r t = h t z t , z t i . i . d . ∼ (0 , ,h t = ω + γ | r t − | + δh t − , ω > , γ > , δ > .β ,i will be constant across diﬀerent quantile orders i.e. β ,i = ¯ β = δ , for i = 1 , . . . , M ,as also largely conﬁrmed by our empirical results on real ﬁnancial data.Equation (16) will then simplify to the following ES t = w + ¯ β + ¯ β | r t − | + ¯ β M (cid:88) i =1 w i Q i,t − (17)= w + ¯ β + ¯ β | r t − | + ¯ β ( ES t − − w ) (18)= β ∗ + ¯ β | r t − | + ¯ β ES t − (19)where β ∗ = w (1 − ¯ β ) + ¯ β . These derivations show that, in our approach, the ES isallowed to have dynamics that are separate from those of VaR. At the same time, theseare automatically implied by the dynamics of conditional quantiles in the tail below VaR,without requiring any additional ad-hoc assumptions.Comparing our approach to other existing proposals, it should be remarked that ourweighted quantile estimator is more ﬂexible than that proposed in Model (6) by Taylor(2019), based on the assumption that the conditional ES is a multiplicative rescaling ofthe ﬁtted VaR model. Also, it diﬀers from the “additive” approach proposed in Model(5) of the same paper under two main respects. First, we directly model the dynamicsof the ES rather than the diﬀerence between ES and VaR. Second, the ES estimates arecontinuously updated, and not only when VaR is violated, as in Taylor (2019).16 ESTIMATION

As discussed in Section 4, the proposed framework involves two estimation steps: theﬁrst for VaR and the second for ES. These are described in detail below.

Step-1: tail VaRs

This step aims to estimate the CAViaR models at the proposed quantile levels α M using a Quasi Maximum Likelihood (QML) approach, following Engle and Manganelli(2004). Although, for ease of presentation, we focus on the CAViaR-SAV model, the sameprocedure can be immediately extended to other variants of the CAViaR framework.In the ﬁrst step, the quantile regression equation parameters ( β ,α m , β ,α m , β ,α m ) areseparately estimated, for each α i ∈ α M , by minimizing the quantile loss function:1 N N (cid:88) t =1 ( α m − I ( r t < Q t,α m ))( r t − Q t,α m ) m = 1 , . . . , M, (20)whose negative, as shown in Giacomini and Komunjer (2005) among others, can be inter-preted as a quasi-likelihood function.As documented by Engle and Manganelli (2004), solutions to the optimization of thequantile loss objective function can be heavily dependent on the chosen initial values. Toaccount for this issue, we adopt a multi-start optimization procedure inspired by thatsuggested in the paper by Engle and Manganelli (2004). First, multiple (10,000 in ourpaper) candidate parameter starting vectors are generated from adequately chosen uni-form random variables, leading to multiple and diﬀerent locally optimal QML estimates.Then the top 2 (out of 10,000) sets of the parameters that produced the highest likelihoodfunction values are used as starting values for another optimization round. Lastly, theﬁnal parameter estimates are selected as the ones producing higher objective functionvalues from the 2 sets of starting values. Step-2: ES

In the second step of the optimization, when the weighted quantile estimator ES ( wq ) t

17n (11) is considered, the parameters to be estimated are the intercept term w and thecoeﬃcients of the Beta lag function ( a, b ). Conditional on ﬁrst stage VaR estimates, theseare estimated minimizing the AL log score function deﬁned in (9). Diﬀerently, for the biascorrected simple average estimator ES ( avg ) t in (15), the only parameter to be estimated isintercept term w which is also estimated by unconstrained optimization. SIMULATION

In this section, simulation studies are conducted to assess the statistical propertiesand performances of the proposed models, with respect to the one-step-ahead VaR andES estimation accuracy.Namely, to compare the bias and eﬃciency of the proposed weighted and simpleaverage quantile methods, both the mean and Root Mean Squared Error (RMSE) valuesare calculated over the replicated data sets.The simulation design is structured as follows: 1000 replicated return series are gener-ated from a Absolute Value (AV) GARCH-t model, considering various degrees of freedom(DoFs) in order to reproduce diﬀerent tail behaviours. The simulated Data GeneratingProcess (DGP) is speciﬁed in the vignette below as Simulation Model (21).

Simulation Model: (AV GARCH-t) r t = σ t ε t , (21) σ t = 0 .

02 + 0 . | r t − | + 0 . σ t , where ε t i . i . d . ∼ t ν (0 ,

1) with ν indicating the DoFs parameter, equal to 5, 10 and 50 respec-tively.To facilitate the comparisons with the ﬁndings of our real data application, the simulationhas been performed considering as sample size n = 1900, that has been chosen to approx-imately match the length of the available in-sample period in our empirical applicationin Section 8. 18he true one-step-ahead level α VaR forecasts from the above simulation model arecalculated as: VaR α,t +1 = σ t +1 t − ν ( α ) (cid:114) ν − ν , where t − ν is the inverse of Student-t’s CDF with the ν degrees of freedom. Similarly, ESforecasts from the same model are calculated as:ES α,t +1 = − σ t +1 (cid:18) g ν ( t − ν ( α )) α (cid:19) (cid:18) ν + ( t − ν ( α )) ν − (cid:19) (cid:114) ν − ν , where g ν is the Student-t PDF.These true VaR and ES forecasts are calculated for each data set and used to computeRMSE for the VaR and ES forecasts obtained from a CAViaR-SAV, for the VaR, and boththe ES ( wq ) and the ES ( avg ) estimators, for the ES. The averages of the true and estimatedVaR & ES, over the 1000 data sets, are given in Table 1 (“True” column).Namely, the proposed weighted quantile estimators in (11), named as WQ-M with M ∈ { , , , } , are then computed for each of the simulated data sets . For com-parison, the simple average with bias correction in (15), denoted as SA-BC-M, and thesimple average without bias correction in (15) with the constraint w = 0, denoted asSA-No-BC-M, are also included in the simulation study.The VaR and ES forecasting simulation results are summarized in Table 1. Sinceboth the weighted and simple average quantile approaches used the exactly same “Step-1” quantile estimation process, the VaR n +1 results for both approaches are identical. Wecan clearly see that, as expected, the quantile forecasts have mean values that are quiteclose to true values with relatively small RMSE. This is evidently due to the fact that theCAViaR-SAV model is correctly speciﬁed under an AV GARCH-t DGP.Focusing on the ES forecasting, the bias results clearly favor the weighted quantileestimator, compared to the simple average approaches, for all the values of ν considered.Due to the extra uncertainty introduced when estimating the Beta lag function parame-ters, the simple average (including the bias correction term w ) approach produces smaller Reminding the considerations in Remark 2, here M indicates the number of quantiles involved in theweighted average with non-zero weights. ES ( wq ) and ES ( avg ) estimators based on only M = 3 grid points (WQ-3 and SA-BC-3) are already characterized by good performances.In addition, using M = 5 ,

10 and 50, we can still observe accurate and very close ESforecasting results. However, the M = 50 setting requires much higher computationalcost compared to other options. Therefore, we have selected M = 3 , ν , the ﬁtted weightsdistribution is characterized by two modes respectively occurring at the lower truncationpoint of the selected grid of α values and immediately before the upper truncation point,that is the target ES order. The lower mode is evidently accounting for the truncationbias arising from the omission of extreme left quantiles and, as it could have been rea-sonably argued, its eﬀect is more substantial for ν = 5. Furthermore, it is worth notingthat, in the absence of left truncation bias, the pattern of the Beta Lag weights would beexpected to match the proﬁle of the returns tail distribution.Table 2 summarizes the simulated distribution of the estimated coeﬃcients for thediﬀerent settings of the ES ( wq ) and ES ( avg ) estimators that have been here considered. ForSA-BC estimators, the average estimated w intercept is, as expected, negative, correctingfor the left tail truncation bias. Conﬁrming our intuition, this is more substantial for heavytailed processes that are for low values of ν .WQ estimators are more ﬂexible since the bias correction takes place through both w and θ = (cid:80) Mi =1 w i . The average value of the estimated intercept is positive beingcompensated by the fact that the average estimated θ is greater than 1, as expected.Again, in line with our ﬁndings for the SA-BC estimator, the diﬀerence ( θ −

1) is higherfor lower values of the degrees of freedom parameter ν .Overall, the simulation results illustrate the validity of the proposed models and the20orresponding estimation process. The performance of the weighted and simple averagequantile approaches will be further compared in the empirical section.Table 1: Simulation results with M = 3 , , , and 50 equally spaced averaged quantilesSummary statistics for proposed models, with data simulated from Model (21). Note that theestimation grid actually includes M + 1 values, that is M + 1 = 4 , ,

11 and 51, with the lastBeta weight being equal to 0 by construction. See Remark 2 for details. n = 1900 WQ-3 SA-BC-3 SA-No-BC-3True Mean RMSE Mean RMSE Mean RMSE ν = 5VaR t +1 -1.3032 -1.3096 0.1572 -1.3096 0.1572 -1.3096 0.1572ES t +1 -1.7853 -1.8093 0.2504 -1.7759 0.2290 -1.6381 0.2653 ν = 10VaR t +1 -1.3775 -1.3798 0.1271 -1.3798 0.1271 -1.3798 0.1271ES t +1 -1.7428 -1.7591 0.1730 -1.7287 0.1677 -1.6368 0.1954 ν = 50VaR t +1 -1.3821 -1.3785 0.1162 -1.3785 0.1162 -1.3785 0.1162ES t +1 -1.6657 -1.6760 0.1408 -1.6525 0.1373 -1.5830 0.1595WQ-5 SA-BC-5 SA-No-BC-5True Mean RMSE Mean RMSE Mean RMSE ν = 5VaR t +1 -1.3032 -1.3102 0.1577 -1.3102 0.1577 -1.3102 0.1577ES t +1 -1.7853 -1.7912 0.2466 -1.7738 0.2249 -1.6082 0.2791 ν = 10VaR t +1 -1.3775 -1.3793 0.1273 -1.3793 0.1273 -1.3793 0.1273ES t +1 -1.7428 -1.7422 0.1708 -1.7272 0.1657 -1.6167 0.2047 ν = 50VaR t +1 -1.3821 -1.3790 0.1156 -1.3790 0.1156 -1.3790 0.1156ES t +1 -1.6657 -1.6605 0.1381 -1.6517 0.1362 -1.5696 0.1651WQ-10 SA-BC-10 SA-No-BC-10True Mean RMSE Mean RMSE Mean RMSE ν = 5VaR t +1 -1.3032 -1.3101 0.1577 -1.3101 0.1577 -1.3101 0.1577ES t +1 -1.7853 -1.7833 0.2429 -1.7723 0.2224 -1.5913 0.2876 ν = 10VaR t +1 -1.3775 -1.3808 0.1271 -1.3808 0.1271 -1.3808 0.1271ES t +1 -1.7428 -1.7372 0.1698 -1.7272 0.1665 -1.6057 0.2120 ν = 50VaR t +1 -1.3821 -1.3779 0.1162 -1.3779 0.1162 -1.3779 0.1162ES t +1 -1.6657 -1.6561 0.1386 -1.6513 0.1368 -1.5614 0.1707WQ-50 SA-BC-50 SA-No-BC-50True Mean RMSE Mean RMSE Mean RMSE ν = 5VaR t +1 -1.3032 -1.3103 0.1575 -1.3103 0.1575 -1.3103 0.1575ES t +1 -1.7853 -1.7824 0.2410 -1.7726 0.2211 -1.5813 0.2935 ν = 10VaR t +1 -1.3775 -1.3805 0.1272 -1.3805 0.1272 -1.3805 0.1272ES t +1 -1.7428 -1.7360 0.1707 -1.7271 0.1669 -1.5986 0.2168 ν = 50VaR t +1 -1.3821 -1.3785 0.1161 -1.3785 0.1161 -1.3785 0.1161ES t +1 -1.6657 -1.6552 0.1376 -1.6516 0.1367 -1.5572 0.1731 Note : A box indicates the favored estimators, based on mean and RMSE. M = 10. 22able 2: Average of the estimated parameters (for WQ and SA-BC approaches) and Betaweights sum (for WQ approach), across the 1000 simulated data sets from Model (21). n = 1900 WQ-3 SA-BC-3 w a b θ = (cid:80) Mi =1 w i w ν = 5 0.1379 1.2358 3.6619 1.1597 -0.1378 ν = 10 0.0805 1.3804 3.0601 1.1052 -0.0919 ν = 50 0.0469 1.5393 2.5897 1.0812 -0.0695WQ-5 SA-BC-5 w a b θ = (cid:80) Mi =1 w i w ν = 5 0.2037 1.7807 3.8162 1.2029 -0.1656 ν = 10 0.0834 1.9007 3.0300 1.1091 -0.1104 ν = 50 0.0473 1.9821 2.7865 1.0775 -0.0822WQ-10 SA-BC-10 w a b θ = (cid:80) Mi =1 w i w ν = 5 0.1821 2.5134 3.9378 1.2109 -0.1810 ν = 10 0.0800 2.5743 3.3084 1.1206 -0.1216 ν = 50 0.0348 2.5173 3.2810 1.0735 -0.0899WQ-50 SA-BC-50 w a b θ = (cid:80) Mi =1 w i w ν = 5 0.1548 3.5111 4.6744 1.2128 -0.1913 ν = 10 0.0641 3.4378 4.3128 1.1191 -0.1285 ν = 50 0.0306 3.4039 4.1765 1.0785 -0.0945 DATA and EMPIRICAL STUDY

The daily data including open, high, low and closing prices, are downloaded from ThomsonReuters Tick History and cover the period from the beginning of 2000 to the end of 2015.Data are collected for 7 market indices: S&P500, NASDAQ (both US), Hang Seng (HongKong), FTSE 100 (UK), DAX (Germany), SMI (Swiss) and ASX200 (Australia).A rolling window with ﬁxed in-sample size is employed for estimation to produce eachone-step-ahead forecast in the forecasting period. Table 3 reports the in-sample size foreach series, which diﬀers due to diﬀerent non-trading days occurring in each market.Two forecasting studies with diﬀerent out-of-sample sizes are conducted. The ﬁrststudy aims to assess the performance of the models speciﬁcally for the 2008 Global Fi-nancial Crisis (GFC) period, thus the initial date of the out-of-sample forecasting periodis chosen as January 2008. Then for each index the out-of-sample size m is chosen as 400,meaning that the end of the forecasting period is approximately falling around August2009.The second forecasting study incorporates a 8 year out-of-sample period, with thestart date of the out-of-sample still chosen as Jan 2008 and out-of-sample size m as 2000.Therefore, the end of the forecasting period is around end of 2015.Both daily one-step-ahead Value-at-Risk (VaR) and Expected Shortfall (ES) forecastsare considered for the returns on the 7 indices, using α = 2 . M ∈ { , , } . Similar to the simulation section, we have also considered the simpleaverage with bias correction, SA-BC-M, and the simple average without bias correction,SA-No-BC-M. For the prediction of ﬁrst stage quantile forecasts, two diﬀerent regressionspeciﬁcations, CAViaR-SAV and CAViaR-AS, have been implemented. The estimation of1 st -stage CAViaR models has been performed following the procedure described in Section24. Furthermore, the forecasting performances of the methods proposed in this paper havebeen compared with those yielded by other previously proposed approaches. Namely, theES-CAViaR models of Taylor (2019) are also included in the study, again employingthe CAViaR-SAV and CAViaR-AS models for the speciﬁcation of the quantile regressioncomponent. These models are estimated following the suggestions from Taylor (2019).To make the models comparable, the CAViaR components of our proposed weightedquantile and ES-CAViaR models have used exactly the same set up. Then, to assist theoptimization in the estimation of ES-CAViaR models, the initial values of the parame-ters of the ES component are also selected by means of an additional random samplingprocedure. When the ES-CAViaR-Add model of expression (5) is used for the ES, 10 candidate parameter vectors are incorporated. For the simpler ES-CAViaR-Mult Model(6), 10 candidate parameter vectors are used.In addition, the following models have also been included in the forecasting com-parison: the conventional GARCH (Bollerslev (1986)), EGARCH (Nelson (1991)) andGJR-GARCH (Glosten et al. (1993)), all with Student-t errors; the GARCH employingHansen’s skewed-t distribution (Hansen (1994)); the CARE (using the size of the ex-pectile level grid as 50) with Symmetric Absolute Value (CARE-SAV) and asymmetricspeciﬁcations (CARE-AS). The GARCH-t, EGARCH-t and GJR-GARCH-t models areestimated using the Econometrics toolbox included in the Matlab 2019b release. TheGARCH-Skew-t and CARE models are estimated by maximum likelihood using the Mat-lab code developed by the authors. One-step-ahead forecasts of VaR and ES are generated for each day in the forecast periodfor each data series.The standard quantile loss function is also employed to compare the models for VaRforecast accuracy. Since the standard quantile loss function is strictly consistent, i.e. the25xpected loss is a minimum at the true quantile series. Thus, the most accurate VaRforecasting model should produce the minimized quantile loss function, given as: n + m (cid:88) t = n +1 ( α − I ( r t < Q t ))( r t − Q t ) , (22)where n is the in-sample size, and m is the out-of sample size with m ∈ { , } . Q n +1 , . . . , Q n + m is a series of quantile forecasts at level α = 2 .

5% for the observations r n +1 , . . . , r n + m .The quantile loss results are presented in Table 3 for each model for each series. Theaverage loss is included in the “Avg Loss” column. The average rank based on ranks ofquantile loss across 7 markets is calculated and shown in the “Avg Rank” column. Boxindicates the favoured model and dashed box indicates the 2nd ranked model based onthe average loss and rank.It is worth reminding that, in the ﬁrst stage of the weighted quantile estimation,VaR predictions are obtained via the estimation of either CAViaR-SAV or CAViaR-ASmodels, named as WQ-SAV or WQ-AS in Table 3. Depending on their ability to accountfor leverage eﬀects in VaR dynamics, the tested models can be grouped into two categories:symmetric and asymmetric. For example, the GARCH-t, CARE-SAV, ES-CAViaR-Add-SAV, ES-CAViaR-Mult-SAV and WQ-SAV models have symmetric volatility or quantile(expectile) component, while the EGARCH-t, GJR-GARCH-t, CARE-AS, ES-CAViaR-Add-AS, ES-CAViaR-Mult-AS and WQ-AS have asymmetric ones.Based on the quantile loss results, we can see that the proposed weighted quantileand ES-CAViaR type models are characterized by very close performances.Also, in general and as expected, asymmetric models tend to perform slightly betterthan symmetric ones. For the SAV type models, the average quantile loss is around58, while, for the AS type models, this average stays around 56, regarding the forecastingstudy on GFC period. For the study with longer forecasting horizon, the SAV and AS typemodels have average quantile loss around 172 and 167 respectively. This is not surprisingsince, as presented in Section 8.1, we have exactly the same CAViaR component forthe weighted quantile and ES-CAViaR models, to make the ES comparison a fair one.26herefore, the empirical results lend evidence on this. The ES-CAViaR framework hasthe CAViaR parameters re-estimated when estimating ES, thus we observe minor quantileloss diﬀerences between the WQ and ES-CAViaR frameworks.For both forecasting studies, EGARCH-t, GJR-GARCH-t and CARE-AS have slightlyhigher average quantile loss values, compared with ES-CAViaR-Add-AS, ES-CAViaR-Add-AS and WQ-AS type models. The symmetric GARCH-t and CARE-SAV have rela-tively less preferred performance compared with the ES-CAViaR-Add-SAV, ES-CAViaR-Mult-SAV and WQ-SAV models.Table 3: Model S&P500 NASDAQ HangSeng FTSE DAX SMI ASX200 Avg Loss Avg RankGARCH-t 57.6 62.4 72.8 56.1 55.6 55.1 48.1 58.2 8.29EGARCH-t 59.8 62.8 67.1 54.7 53.4 51.4 47.8 56.7 4.57GJR-GARCH-t 55.4 62.0 67.3 54.9 54.0 52.3 47.2 56.2 4.14GARCH-Skew-t 56.5 61.8 72.6 55.3 54.4 54.9 47.0 57.5 5.71CARE-SAV 60.2 65.3 66.3 57.0 58.6 53.9 51.8 59.0 10.14CARE-AS 61.1 66.8 61.7 55.0 55.9 54.9 48.9 57.7 8.71ES-CAViaR-Add-AS 60.6 63.0 63.2 52.9 53.7 51.4 47.0 56.0 4.00ES-CAViaR-Mult-AS 59.1 63.7 65.0 52.8 54.4 51.6 46.0 56.1 4.14ES-CAViaR-Add-SAV 59.6 63.8 68.5 55.7 56.8 53.3 48.1 58.0 8.71ES-CAViaR-Mult-SAV 58.7 63.4 70.3 55.3 57.1 53.3 47.8 58.0 7.43WQ-AS 59.9 63.3 63.8 53.1 53.6 51.8 46.6 56.0 4.29WQ-SAV 59.4 63.1 69.6 55.7 56.8 53.3 47.9 58.0 7.86Out-of-sample m

400 400 400 400 400 400 400In-sample n m n Note : Box indicates the favoured model and dashed box indicates the 2nd ranked model basedon the average loss and rank. .3 Evaluation of forecasting performance: VaR and ES JointLoss In this section we assess the ability of the diﬀerent models under comparison to forecastVaR and ES jointly. To this purpose, Table 4 reports, for each model and data series,the value of the loss function in Equation (9) aggregated over the out-of-sample period: S = (cid:80) n + mt = n +1 S t , with m ∈ { , } . We use this to jointly compare the VaR and ESforecasts from all models, because the AL log-score in Equation (9) is a strictly consistentscoring rule that is jointly minimized by the true VaR and ES series.As mentioned in Section 8.1, for ES prediction, incorporating M ∈ { , , } wehave implemented the weighted quantile approach WQ-M, the simple average with biascorrection SA-BC-M, and the simple average without bias correction SA-No-BC-M. Byfurther incorporating the CAViaR-SAV and CAViaR-AS, we have 18 frameworks to betested. Including the other 10 competing models, we have 28 models in total in Table 4.With respect to the forecasting study on GFC period, based on the average VaR &ES joint loss values the proposed WQ-3-AS produces the smallest loss, followed by ES-CAViaR-Mult-AS. The WQ-3-AS model also on average ranks as the best, followed by theWQ-5-AS. As discussed in Section 8.2, we have employed the same CAViaR componentfor weighted quantile and ES-CAViaR models. Therefore, the good performance of theproposed weighted quantile framework lends evidence on its validity for predicting ES.In addition, the WQ-AS models on average rank better and produce lower loss com-pared to EGARCH-t and GJR-GARCH-t models. Comparing the ES-CAViaR-SAV typemodels with the WQ-SAV frameworks, we can still see that the WQ-SAV models, basedon diﬀerent numbers of grid points M , have lower average loss and rank better thanES-CAViaR-Add-SAV and similar performance compared with ES-CAViaR-Mult-SAV.On the other end, the CARE-SAV model on average produces the highest average jointloss. The SA-No-BC frameworks produce slightly smaller loss values and rank similar,compared to CARE-SAV.Lastly, the weighted quantile framework has consistently improved performance than28he SA-BC, which demonstrates the usefulness of weighted average scheme. In addition,the performance of SA-BC is clearly better compared with SA-No-BC, demonstrating theeﬀectiveness of the bias correction.Regarding the forecasting study with out-of-sample size 2000, the top 2 performedmodels are ES-CAViaR-Mult-AS and ES-CAViaR-Add-AS models. The SA-BC-3-AS andWQ-3-AS frameworks rank as the 3rd and 4th, with clear better performance comparedto EGARCH-t, GJR-GARCH-t and CARE-AS.Comparing the ES-CAViaR-SAV type models with the WQ-SAV type models, westill observe the proposed WQ-SAV frameworks have improved performance compared toES-CAViaR-Add-SAV and close performance compared to ES-CAViaR-Mult-SAV.Finally, we would like to emphasize that the WQ framework using M = 3 can alreadygenerate very competitive performance, for both forecasting studies. Such results lendevidence on the fact the proposed WQ framework can work eﬀectively without havingsigniﬁcantly increased computation cost, compared to other models. Compared withES-CAViaR models, the WQ type models estimate and forecast ES nonparametrically,without assuming the relationships between the ES and VaR dynamics. In this section, the Model Conﬁdence Set (MCS) (Hansen et al., 2011) is used to assessthe statistical signiﬁcance of diﬀerences in the values of the AL log-score observed for thevarious model under comparison, avoiding multiple testing biases.A MCS is a set of models that is constructed such that it will contain the bestmodel with a given level of conﬁdence (75% is used in our paper). All computationswere performed using the Matlab code for MCS testing included in Kevin Sheppard’sMFE toolbox . The R and SQ methods which use absolute and squared values sumrespectively during the calculation of test statistic are employed in our paper, details as

29n page 465 of (Hansen et al., 2011).Table 5 presents the 75% MCS using both the R and SQ methods, for two forecastingstudies with out-of-sample size m as 400 and 2000 respectively. Columns “R-Total-GFC”and “SQ-Total-GFC”, “R-Total” and “SQ-Total” count the total number of times that amodel is included in the 75% MCS across the 7 return series.Overall, we observe our weighted quantile models are more or equally likely to beincluded in MCS, in comparison with other models. For both R and SQ methods acrosstwo forecasting studies, ES-CAViaR-Mult-AS, WQ-3-AS and WQ-10-AS are the only 3models that are included in MCS for all 7 series.More speciﬁcally for the forecasting study on GFC period, via the R method, ES-CAViaR-Mult-AS, WQ-3-AS and WQ-10-AS are included in the MCS for all 7 markets,followed by EGARCH-t, GJR-GARCH-t, ES-CAViaR-Add-AS, SA-BC-3-AS, WQ-5-ASand SA-BC-5-AS models. Via the SQ method, all the WQ-AS type models are includedin the MCS for all the 7 markets, as well as the EGARCH-t, GJR-GARCH-t and ES-CAViaR-AS type models.With respect to the forecasting study with m = 2000, for both R and SQ method,all the WQ-AS and SA-BC-AS models are included in MCS for 7 times, together withES-CAViaR-AS models. We can see that EGARCH-t and GJR-GARCH-t are less likelyto be included in MCS, compared to our proposed frameworks. The GARCH-t is leastlikely to be included in MCS for both R and SQ methods.30 CONCLUSION

In this paper, we propose an innovative semi-parametric weighted quantile frameworkfor ES estimation and forecasting. The proposed approach relies on a two step estimationprocedure. The quantiles weighting scheme is parsimoniously parameterized by incorpo-rating a Beta function whose coeﬃcients are optimized by minimizing a consistent jointVaR and ES loss function. Through simulation study, we have demonstrated the eﬀec-tiveness of the proposed framework. In an empirical study, focusing on the high volatileGFC period, improvements in the out-of-sample forecasting of ES are observed, comparedto traditional GARCH and CARE models, as well as the ES-CAViaR models. Empiricalevidence on a longer forecasting period conﬁrms the superiority of the WQ framework overGARCH-type and CARE models and its competitiveness with state-of-the-art approachessuch as the ES-CAViaR models.The proposed framework can be extended in a number of ways. First, at the momentthe ﬁrst stage of the framework only uses quantile estimates from CAViaR. However, theproposed framework is quite ﬂexible, so it can actually incorporate quantile estimatesobtained from any models. Second, during the second stage of the estimation (when esti-mating the parameters of Beta function), we can also re-estimate the CAViaR parametersto potentially further improve the VaR and ES estimation accuracy, similar to the ES-CAViaR estimation. Third, the framework can be also extended by the idea of forecastingcombination, see Taylor (2020) as example.

References

Acerbi, C. and D. Tasche (2002). Expected shortfall: A natural coherent alternative tovalue at risk.

Economic Notes 31 (2), 379–388.Aigner, D. J., T. Amemiya, and D. J. Poirier (1976). On the estimation of productionfrontiers: Maximum likelihood estimation of the parameters of a discontinuous densityfunction.

International Economic Review 17 (2), 377–396.Artzner, P. (1997). Thinking coherently.

Risk , 68–71.31rtzner, P., F. Delbaen, J. Eber, and D. Heath (1999). Coherent measures of risk.

Mathematical Finance 9 (3), 203–228.Basel Committee on Banking Supervision (2013).

Fundamental review of the trading book:A revised market risk framework . Bank for International Settlements.Basel Committee on Banking Supervision (2019).

Minimum capital requirements for mar-ket risk . Bank for International Settlements.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.

Journalof econometrics 31 (3), 307–327.Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score modelswith applications.

Journal of Applied Econometrics 28 (5), 777–795.Davis, P. J. and P. Rabinowitz (1984).

Computer Science and Applied Mathematics (Second Edition ed.). Academic Press.Engle, R. F. and S. Manganelli (2004). Caviar: Conditional autoregressive value at riskby regression quantiles.

Journal of Business & Economic Statistics 22 (4), 367–381.Fissler, T. and J. F. Ziegel (2016). Higher order elicitability and Osband’s principle.

TheAnnals of Statistics 44 (4), 1680–1707.Gerlach, R. and C. Wang (2020). Bayesian semi-parametric realized conditional autore-gressive expectile models for tail risk forecasting.

Journal of Financial Econometrics(In Press) .Gerlach, R. H., C. W. S. Chen, and N. Y. C. Chan (2011). Bayesian time-varying quantileforecasting for value-at-risk in ﬁnancial markets.

Journal of Business & EconomicStatistics 29 (4), 481–492.Ghysels, E., A. Sinko, and R. Valkanov (2007). Midas regressions: Further results andnew directions.

Econometric Reviews 26 (1), 53–90.Giacomini, R. and I. Komunjer (2005). Evaluation and combination of conditional quantileforecasts.

Journal of Business & Economic Statistics 23 (4), 416–431.32losten, L. R., R. Jagannathan, and D. E. Runkle (1993). On the relation between theexpected value and the volatility of the nominal excess return on stocks.

The journalof ﬁnance 48 (5), 1779–1801.Hansen, B. E. (1994). Autoregressive conditional density estimation.

International Eco-nomic Review , 705–730.Hansen, P. R., A. Lunde, and J. M. Nason (2011). The model conﬁdence set.

Economet-rica 79 (2), 453–497.Harvey, A. (2013).

Dynamic Models for Volatility and Heavy Tails . Econometric SocietyMonographs.Koenker, R. and J. A. F. Machado (1999). Goodness of ﬁt and related inference processesfor quantile regression.

Journal of the American Statistical Association 94 (448), 1296–1310.Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach.

Econometrica: Journal of the Econometric Society , 347–370.Newey, W. K. and J. L. Powell (1987). Asymmetric least squares estimation and testing.

Econometrica 55 (4), 819–847.Patton, A. J., J. F. Ziegel, and R. Chen (2019). Dynamic semiparametric models forexpected shortfall (and value-at-risk).

Journal of Econometrics 211 (2), 388 – 413.Taylor, J. W. (2008). Estimating value at risk and expected shortfall using expectiles.

Journal of Financial Econometrics 6 (2), 231–252.Taylor, J. W. (2019). Forecasting value at risk and expected shortfall using a semipara-metric approach based on the asymmetric laplace distribution.

Journal of Business &Economic Statistics 37 (1), 121–133.Taylor, J. W. (2020). Forecast combinations for value at risk and expected shortfall.

International Journal of Forecasting 36 (2), 428–441.33able 4:

Model S&P500 NASDAQ HangSeng FTSE DAX SMI ASX200 Avg Loss Avg RankGARCH-t 1102.0 1134.5 1185.2 1115.6 1114.6 1084.6 1044.2 1111.5 16.6EGARCH-t 1114.2 1131.5 1138.7 1108.8 1068.2 1031.9 1030.3 1089.1 7.4GJR-GARCH-t 1078.6 1124.4 1143.2 1107.0 1082.6 1047.7 1027.9 1087.4 8.9GARCH-Skew-t 1084.2 1125.1 1182.7 1094.8 1098.0 1069.9 1026.2 1097.3 11.9CARE-SAV 1135.4 1179.7 1143.8 1119.4 1133.7 1058.1 1101.0 1124.4 24.6CARE-AS 1141.3 1182.3 1115.3 1120.8 1094.1 1063.2 1070.8 1112.5 21.4ES-CAViaR-Add-AS 1127.4 1135.4 1125.4 1085.0 1075.0 1034.2 1041.8 1089.2 9.6ES-CAViaR-Mult-AS 1121.3 1142.6 1134.1 1078.5 1075.6 1028.3 1024.2 1086.4 6.9ES-CAViaR-Add-SAV 1115.5 1146.7 1165.4 1105.6 1126.2 1060.9 1060.5 1111.5 18.4ES-CAViaR-Mult-SAV 1110.8 1141.7 1168.5 1098.0 1123.3 1052.5 1049.2 1106.3 13.6WQ-3-AS 1120.1 1139.5 1123.0 1086.0 1074.4 1029.6 1031.4 1086.3 5.9SA-BC-3-AS 1120.5 1142.8 1124.0 1090.7 1074.8 1033.1 1031.4 1088.2 8.6SA-No-BC-3-AS 1127.3 1152.6 1125.4 1096.0 1080.0 1037.4 1036.7 1093.6 13.1WQ-5-AS 1119.6 1143.7 1122.0 1092.1 1074.5 1029.2 1030.2 1087.3 6.6SA-BC-5-AS 1122.2 1144.9 1121.8 1095.9 1074.3 1032.5 1031.4 1089.0 9.1SA-No-BC-5-AS 1130.4 1155.8 1123.2 1104.8 1080.6 1037.9 1038.3 1095.9 14.7WQ-10-AS 1120.1 1143.1 1122.2 1086.9 1077.6 1030.7 1037.1 1088.2 8.0SA-BC-10-AS 1122.1 1143.9 1121.6 1092.1 1077.7 1033.6 1039.7 1090.1 10.1SA-No-BC-10-AS 1130.9 1155.2 1123.1 1100.5 1084.8 1040.3 1047.8 1097.5 15.1WQ-3-SAV 1116.5 1142.2 1167.3 1105.8 1121.1 1055.9 1050.9 1108.5 14.9SA-BC-3-SAV 1117.4 1144.6 1170.8 1110.3 1122.7 1057.1 1053.3 1110.9 18.7SA-No-BC-3-SAV 1125.7 1155.9 1173.1 1116.8 1129.1 1063.0 1057.4 1117.3 23.6WQ-5-SAV 1116.6 1143.8 1166.7 1104.6 1120.5 1055.6 1054.7 1108.9 15.6SA-BC-5-SAV 1122.0 1145.5 1169.7 1110.2 1121.6 1057.3 1055.1 1111.6 19.9SA-No-BC-5-SAV 1132.2 1157.3 1172.0 1117.4 1129.2 1064.1 1060.6 1119.0 25.4WQ-10-SAV 1114.0 1142.6 1166.4 1106.0 1121.1 1055.6 1054.1 1108.5 14.4SA-BC-10-SAV 1117.3 1143.7 1170.9 1111.7 1122.1 1057.0 1052.1 1110.7 18.0SA-No-BC-10-SAV 1127.9 1155.7 1173.7 1119.7 1130.7 1064.2 1058.6 1118.6 25.1Out-of-sample m

400 400 400 400 400 400 400GARCH-t 4324.8 4568.2 4700.7 4244.3 4658.4 4300.1 4071.6 4409.7 27.0EGARCH-t 4290.6 4517.6 4586.9 4192.1 4585.2 4245.2 3990.4 4344.0 16.6GJR-GARCH-t 4239.2 4490.2 4601.7 4183.8 4601.4 4253.6 4009.1 4339.9 13.1GARCH-Skew-t 4254.7 4491.0 4681.8 4194.0 4595.4 4245.3 4009.5 4353.1 15.3CARE 4304.8 4528.0 4664.2 4204.9 4606.2 4294.2 4080.8 4383.3 23.3CARE-AS 4274.9 4470.5 4550.7 4158.5 4518.5 4250.7 4031.3 4322.2 12.6ES-CAViaR-Add-AS 4242.3 4439.2 4551.9 4131.8 4506.8 4192.1 3992.9 4293.9 3.7ES-CAViaR-Mult-AS 4242.3 4451.2 4564.2 4117.8 4509.1 4188.3 3977.4 4292.9 4.4ES-CAViaR-Add-SAV 4304.4 4505.2 4678.6 4199.8 4612.0 4246.3 4042.7 4369.9 22.6ES-CAViaR-Mult-SAV 4289.8 4498.8 4677.3 4190.4 4609.0 4239.3 4042.4 4363.8 18.7WQ-3-AS 4252.6 4448.2 4553.1 4132.0 4507.8 4194.1 3980.2 4295.4 4.9SA-BC-3-AS 4253.7 4450.8 4551.9 4133.1 4507.3 4196.6 3979.5 4296.1 4.7SA-No-BC-3-AS 4265.5 4468.2 4555.2 4145.6 4520.3 4214.5 3985.7 4307.8 9.7WQ-5-AS 4255.0 4451.5 4552.3 4141.6 4506.9 4196.6 3977.7 4297.4 5.9SA-BC-5-AS 4261.2 4452.5 4549.5 4140.0 4505.4 4200.0 3976.3 4297.8 5.3SA-No-BC-5-AS 4275.3 4471.8 4553.0 4156.8 4520.9 4222.3 3984.5 4312.1 10.3WQ-10-AS 4254.7 4450.7 4552.7 4125.8 4509.3 4193.9 3985.9 4296.1 5.6SA-BC-10-AS 4259.3 4452.4 4549.8 4125.0 4508.5 4198.5 3983.4 4296.7 5.6SA-No-BC-10-AS 4274.7 4473.0 4554.2 4139.6 4525.3 4223.2 3993.2 4311.9 10.9WQ-3-SAV 4292.5 4492.5 4699.4 4191.9 4606.2 4237.4 4030.9 4364.4 17.3SA-BC-3-SAV 4293.5 4495.5 4704.4 4190.3 4608.4 4240.2 4028.7 4365.9 18.9SA-No-BC-3-SAV 4306.2 4514.2 4716.8 4199.7 4619.5 4256.9 4030.0 4377.6 23.6WQ-5-SAV 4291.1 4492.4 4697.0 4189.6 4608.4 4238.5 4037.5 4364.9 17.7SA-BC-5-SAV 4296.9 4494.2 4701.6 4188.2 4610.0 4243.3 4033.6 4366.8 19.9SA-No-BC-5-SAV 4311.1 4513.9 4715.3 4199.4 4623.4 4262.7 4037.1 4380.4 24.6WQ-10-SAV 4291.5 4492.5 4697.6 4198.3 4606.6 4238.5 4037.7 4366.1 18.7SA-BC-10-SAV 4295.6 4494.1 4703.3 4198.0 4607.9 4242.4 4031.3 4367.5 19.7SA-No-BC-10 -SAV4311.2 4515.3 4717.6 4211.5 4622.7 4263.0 4036.0 4382.5 25.7Out-of-sample m Note : Box indicates the favoured model and dashed box indicates the 2nd ranked model basedon the average loss and rank.

75% model conﬁdence set results summary with R and SQ methods.

Model R-Total-GFC SQ-Total-GFC R-Total SQ-TotalGARCH-t 4 3 1 2EGARCH-t 6 7 5 5GJR-GARCH-t 6 7 5 6GARCH-Skew-t 5 5 4 5CARE 4 6 2 3CARE-AS 4 5 5 7ES-CAViaR-Add-AS 6 7 7 7ES-CAViaR-Mult-AS 7 7 7 7ES-CAViaR-Add-SAV 3 5 2 4ES-CAViaR-Mult-SAV 3 5 3 4WQ-3-AS 7 7 7 7SA-BC-3-AS 6 7 7 7SA-No-BC-3-AS 4 7 5 7WQ-5-AS 6 7 7 7SA-BC-5-AS 6 7 7 7SA-No-BC-5-AS 4 6 5 6WQ-10-AS 7 7 7 7SA-BC-10-AS 4 7 7 7SA-No-BC-10-AS 4 6 5 5WQ-3-SAV 3 5 5 4SA-BC-3-SAV 3 5 4 4SA-No-BC-3-SAV 2 3 3 4WQ-5-SAV 3 5 3 4SA-BC-5-SAV 3 5 4 4SA-No-BC-5-SAV 2 2 3 3WQ-10-SAV 2 5 3 4SA-BC-10-SAV 2 5 4 4SA-No-BC-10-SAV 2 3 2 3Out-of-sample m

400 400 2000 2000