[PDF] Adaptive Random Bandwidth for Inference in CAViaR Models

Abstract

This paper investigates the size performance of Wald tests for CAViaR models (Engle and Manganelli, 2004). We find that the usual estimation strategy on test statistics yields inaccuracies. Indeed, we show that existing density estimation methods cannot adapt to the time-variation in the conditional probability densities of CAViaR models. Consequently, we develop a method called adaptive random bandwidth which can approximate time-varying conditional probability densities robustly for inference testing on CAViaR models based on the asymptotic normality of the model parameter estimator. This proposed method also avoids the problem of choosing an optimal bandwidth in estimating probability densities, and can be extended to multivariate quantile regressions straightforward.

Full PDF

AAdaptive Random Bandwidth for Inferencein CAViaR Models

Alain Hecq Li Sun ∗ Maastricht UniversityJanuary 29, 2021

Abstract

This paper investigates the size performance of Wald tests for CAViaR models (Engleand Manganelli, 2004). We ﬁnd that the usual estimation strategy on test statisticsyields inaccuracies. Indeed, we show that existing density estimation methods can-not adapt to the time-variation in the conditional probability densities of CAViaRmodels. Consequently, we develop a method called adaptive random bandwidth whichcan approximate time-varying conditional probability densities robustly for inferencetesting on CAViaR models based on the asymptotic normality of the model parame-ter estimator. This proposed method also avoids the problem of choosing an optimalbandwidth in estimating probability densities, and can be extended to multivariatequantile regressions straightforward.JEL Codes: C22Keywords: covariance matrix estimation in quantile regressions, CAViaR models,bandwidth choice, stability conditions for CAViaR DGPs.

Financial risk management is at the heart of banks’ and ﬁnancial institutions’ activities toguide them in their investment plans, supervisory decisions, risk capital allocations and forexternal regulations. The use of quantitative risk measures has become essential in ﬁnancialrisk management. One of the most popular risk measures associated with ﬁnancial portfoliosis the value at risk (VaR hereafter). The VaR at probability τ ∈ (0 ,

1) of a portfolio is deﬁnedas the minimum potential loss that the portfolio may suﬀer in the worst τ portion of allpossible outcomes over a given time horizon. VaR is very intuitive (Duﬃe and Pan, 1997)and has for instance been incorporated into the 1996 Amendment to the Capital Accord formeasuring the market risk in ﬁnancial positions of each ﬁnancial institution. Therefore,VaR is still a widely used risk measure even though many approaches to measuring marketand credit risks have been proposed in the literature. ∗ Corresponding author: Li Sun, Maastricht University, School of Business and Economics, De-partment of Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands. Email:[email protected] a r X i v : . [ ec on . E M ] F e b enerally, there are three ways to estimate VaR: (i) historical simulations, (ii) semi-parametric approaches and (iii) fully parametric frameworks. Within the class of semi-parametric approaches, it typically includes extreme value theory analyses and quantileregression techniques. In this paper, we focus on quantile regressions for the VaR esti-mation as quantile regressions are straightforward in studying one quantile of interest andnumerically eﬃcient without imposing parametric distributional assumptions.Despite that the VaR is just a particular quantile of future portfolio losses conditionalon present information, it is essentially a part of the underlying conditional distribution.VaR models are supposed to embrace features of the empirical conditional distributions ofreturns, such as time-variation and conditional heteroskedasticity. Drawing on (G)ARCHspeciﬁcations which capture the presence of time-varying conditional heteroskedasticity intime series, Engle and Manganelli (2004) have proposed to estimate conditional autoregres-sive value at risk by regression quantiles (CAViaR). It is appealing to consider CAViaRmodels for estimating VaR as CAViaR models associate the conditional quantile of interestwith observable variables as well as the implicit information on lagged conditional quantiles.This paper carefully investigates the size performance of Wald tests for CAViaR models.Having an accurate test statistic is important to obtain reliable models in ﬁnancial applica-tions. Several speciﬁcations are nested within a CAViaR speciﬁcation, such as static quantileregressive models and quantile autoregressive models (see Koenker and Xiao, 2006; Hecqand Sun, 2020). Moreover, there exists several models nested within the general CAViaRspeciﬁcation that have been proposed in the literature. For instance, asymmetric slopeCAViaR models (Engle and Manganelli, 2004) that split the eﬀect of positive and negativeyesterday’s news shocks. Wald tests are used to test the null of a symmetric news impact.However, we ﬁnd that the usual estimation strategy yields inaccuracies. Indeed, we showthat existing density estimation methods cannot adapt to the time-variation in the condi-tional probability densities of CAViaR models. The method that we develop in this paperis able to adapt to time-varying conditional probability densities and produces much morereliable results than the existing ones for inference testing on CAViaR models based on theasymptotic normality of the model parameter estimator. This proposed method also avoidsthe haunting problem of choosing an optimal bandwidth in estimating probability densities,and can be extended to multivariate quantile regressions straightforward in theory.The remainder of this paper is structured as follows. In Section 2, stability conditionsfor CAViaR data generating processes (DGPs) to be non-explosive are derived. In Section 3,we investigate the size performance of Wald tests for CAViaR models and ﬁnd large sizedistortions by the usual estimation strategy. So we introduce a method called adaptiverandom bandwidth . An empirical study on stock returns is performed in Section 4. FinallySection 5 concludes this paper. Let us consider a stationary time series process { y t } Tt =1 for instance the return of an asset or aportfolio, and denote x t a vector of observable variables at time t and F t the information setup to time t which is the σ -algebra generated by { x t , y t , x t − , y t − , . . . } . The τ -th quantile( τ ∈ (0 , τ of y t conditional on F t − is denoted as f t ( β τ , x t − ) (orsimply f t ( β τ ) when x t − is taken in obviously). A generic CAViaR speciﬁcation proposed2y Engle and Manganelli (2004) is f t ( β τ ) = β + q (cid:88) i =1 β i f t − i ( β τ ) + r (cid:88) j =1 β q + j l ( x t − j ) , (1)where β (cid:48) τ := [ β , β , . . . , β p ] collects the p = q + r slope parameters, and l is a functionof a ﬁnite number of lagged observable variables, for instance the lagged returns enteringpotentially with diﬀerent weights for positive and negative past lagged returns. As describedin Engle and Manganelli (2004) the autoregressive terms β i f t − i ( β τ ) can ensure that thequantile changes smoothly over time. The quantile autoregressive model (QAR) of Koenkerand Xiao (2006) is nested in the CAViaR speciﬁcation by restricting β = ... = β q = 0in CAViaR. The role of l ( x t − j ) is to account for the association of f t ( β τ ) with observablevariables in F t − . CAViaR models as a generalization of QAR models are able to capture thetime-variation in the conditional quantile in a way similar to GARCH models in explainingtime-varying volatility and volatility clustering in ﬁnancial time series in addition to ARCHmodels.The CAViaR model (1) is nonlinear in parameters as long as there exists a nonzero β i , i ∈ { , . . . , q } which leads to ∂f t ( β τ ) ∂β i = f t − i ( β τ ) + β i ∂f t − i ( β τ ) ∂β i not independent of β i . The algorithm to estimate CAViaR models is given in Section 2.2.For illustration, we simulate samples from the following three CAViaR DGPs in (2) andplot Figure 1 (a). In Figure 1 (a), we see a decreasing trend in CAViaR DGP 1.a mainlydue to the negative term − . | y t − | in f t ( β τ ) compared with CAViaR DGP 1.b. ComparingCAViaR DGP 1.b with 1.c, we ﬁnd that CAViaR DGP 1.b has a larger spread due to ahigher slope of f t − ( τ ) in f t ( β τ ). A similar ﬁnding further applies on Figure 1 (b) whichplots simulated samples of CAViaR DGP 2.a, 2.b and 2.c in (3) respectively.  CAViaR DGP 1.a: f t ( β u t ) = F − t (3) ( u t ) + 0 . f t − ( β u t ) − . | y t − | , CAViaR DGP 1.b: f t ( β u t ) = F − t (3) ( u t ) + 0 . f t − ( β u t ) − . y t − , CAViaR DGP 1.c: f t ( β u t ) = F − t (3) ( u t ) − . y t − , (2)where { u t } is i.i.d. in the standard uniform distribution (denoted as U (0 , F − t (3) ( · ) isthe inverse function of Student’s t-distribution with 3 degrees of freedom ( t (3) hereafter).  CAViaR DGP 2.a: f t ( β u t ) = F − t (3) ( u t ) − . f t − ( β u t ) + 0 . | y t − | CAViaR DGP 2.b: f t ( β u t ) = F − t (3) ( τ ) − . f t − ( β u t ) + 0 . y t − CAViaR DGP 2.c: f t ( β u t ) = F − t (3) ( u t ) + 0 . y t − , (3)where u t i.i.d. ∼ U (0 , t = 1 , , . . . , T . The stationarity of CAViaR time series is required for the model estimation consistency (En-gle and Manganelli, 2004). After simulating a CAViaR DGP, we can view its behaviour such In Appendix A, the gradient and the Hessian matrix of CAViaR models are illustrated to emphasize thatthe nonlinearity of model parameters makes CAViaR models diﬀerent from other linear quantile regressionmodels. All the simulations of CAViaR DGPs in this paper follow the procedure given in Appendix B. τ -th ( τ ∈ (0 , { y t } speciﬁed as follows: y t = f t ( β u t ) = β ( u t ) + q (cid:88) i =1 β i ( u t ) f t − i ( β u t ) + r (cid:88) j =1 β q + j ( u t ) y t − j , (4)where u t i.i.d. ∼ U (0 , β (cid:48) u t := [ β ( u t ) , β ( u t ) , . . . , β p ( u t )] with p = q + r . There is amonotonicity requirement on this model which is that f t ( β u t ) is monotonically increasingin u t so that the τ -th quantile ( τ ∈ (0 , y t conditional on F t − can be expressed as f t ( β τ ).Assume the conditional τ -th quantile of { y t } follows the model (4) with nonzero proba-bility density to occur at each time. Without loss of generality, there is a time t ∈ { , . . . , T } such that y t = f t ( β τ ) = β + q (cid:88) i =1 β i f t − i ( β τ ) + r (cid:88) j =1 β q + j y t − j . y t . First we have the following equation from (4).  − r (cid:88) j =1 β q + j L j  y t = β + q (cid:88) i =1 β i f t − i ( β τ )= β + q (cid:88) i =1 β i  β + q (cid:88) i =1 β i f t − i − i ( β τ ) + r (cid:88) j =1 β q + j y t − i − j  = β + q (cid:88) i =1 β i  β + q (cid:88) i =1 β i f t − i − i ( β τ ) + r (cid:88) j =1 β q + j L i + j y t  = (cid:32) q (cid:88) i =1 β i (cid:33) β + q (cid:88) i =1 q (cid:88) i =1 β i β i f t − i − i ( β τ ) + q (cid:88) i =1 r (cid:88) j =1 β i β q + j L i + j y t , where the second line is obtained by substituting the speciﬁcation (4) of f t − i − i ( β τ ) intothe ﬁrst line, and L is the lag operator. Further rewrite the above equation, and we have  − r (cid:88) j =1 β j L j − q (cid:88) i =1 r (cid:88) j =1 β i β q + j L i + j  y t = (cid:32) q (cid:88) i =1 β i (cid:33) β + q (cid:88) i =1 q (cid:88) i =1 β i β i f t − i − i ( β τ )= (cid:32) q (cid:88) i =1 β i (cid:33) β + q (cid:88) i =1 q (cid:88) i =1 β i β i f t − i − i ( β τ ) . We continue to rewrite the lagged terms of f t ( β τ ) on the right-hand side of the aboveequation, and then organize the equation such that only the left-hand side contains termsof y t . Therefore, we obtain that  − r (cid:88) j =1 β q + j L j − q (cid:88) i =1 r (cid:88) j =1 β i β q + j L i + j − . . . − q (cid:88) i =1 q (cid:88) i =1 . . . q (cid:88) i n =1 r (cid:88) j =1 β i β i . . . β i n β q + j L i + i ... + i n + j  y t =  q (cid:88) i =1 β i + (cid:32) q (cid:88) i =1 β i (cid:33) + . . . + (cid:32) q (cid:88) i =1 β i (cid:33) n  β + q (cid:88) i =1 q (cid:88) i =1 . . . q (cid:88) i n +1 =1 r (cid:88) j =1 β i β i . . . β i n +1 f t − i − i − ... − i n +1 ( β τ ) . (5)Now we can get the ﬁrst necessary condition for { y t } to be nonexplosive, which is (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q (cid:88) i =1 β i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < . (6)Under the condition (6), we can simplify the equation (5) when letting n → ∞ as follows:  − r (cid:88) j =1 β q + j L j ∞ (cid:88) m =0 (cid:32) q (cid:88) i =1 β i L i (cid:33) m  y t = 11 − q (cid:80) i =1 β i β . g ( x ) of y t which is g ( x ) := 1 − r (cid:88) j =1 β q + j x j ∞ (cid:88) m =0 (cid:32) q (cid:88) i =1 β i x i (cid:33) m . (7)So the second necessary condition for { y t } to be nonexplosive is that the roots of g ( x ) areoutside the unit circle. When there exists at least one β i (cid:54) = 0 , i ∈ { , . . . , q } , this secondcondition is equivalent to require that the roots of g ( x ) := 1 − q (cid:80) i =1 β i x i − r (cid:80) j =1 β q + j x j andthe common roots of g ( x ) := 1 − q (cid:80) i =1 β i x i and g ( x ) := 1 − r (cid:80) j =1 β q + j x j all are outside theunit circle. More examples of CAViaR DGPs are illustrated in Appendix D, among whichwe can ﬁnd explosive DGPs which break the condition on the roots of g ( x ) but meet thecondition on the common roots of g ( x ) and g ( x ).We can review Figure 1 with the above stability conditions. CAViaR DGP 1.b, 1.c,2.b and 2.c meet the above conditions and we also see their nonexplosive behaviours in theplots. The nonexplosiveness of CAViaR DGP 2.a can also be ensured since it has a narrowerspread in theory in comparison with CAViaR DGP 2.b. On the other hand, we know thatCAViaR DGP 1.a has a downward trend due to the negative term − . | y t − | and hence isexplosive. The estimation for CAViaR models can be achieved by the diﬀerential evolutionary geneticalgorithm (Storn and Price, 1997) used by Engle and Manganelli (2004). Suppose the model f t ( β ) is speciﬁed as (1) for data { y t } Tt =1 . We want to obtain the parameter estimator (cid:98) β bythe following optimization:  (cid:98) β = arg min β ∈ R p +1 S T ( β ) S T ( β ) := T (cid:88) t =1 ρ τ ( y t − f t ( β )) (8)where S T ( β ) is the objective function in quantile regressions, and ρ τ ( x ) := x ( τ − { x < } )is called check function (Koenker, 2005) with the indicator function {·} .Following the steps below, we can obtain (cid:98) β in (8).Step 1: Generate n (say 10 ) trial vectors independently from a uniform distribution U ( b L , b p )as n parameter initial trials, where b L and b p are ( p + 1) × β oτ of the underlying processin our belief. It is worth mentioning that the values of { f − i ( β τ ) , i = 1 , . . . , q } and { y − j , j = 1 , . . . , r } acting as initial conditions are also input-demanded in order tocalculate { f t ( β ) } Tt =1 for any β ∈ R p +1 . For instance, as used by Engle and Manganelli(2004) f ( β τ ) is given as the estimated τ -th quantle of { y t } (cid:98) . T (cid:99) t =1 and is ﬁxed in theoptimization. Step 2: Each parameter initial is used to kick oﬀ a minimization routine on the objective (cid:98)·(cid:99) is known as the ﬂoor function (or the greatest integer function) and (cid:98)·(cid:99) : R → Z of a real number x denotes the greatest integer less than or equal to x . The Nelder Mead simplex algorithm is used in our minimization routine. S T ( β ), and the returned value of (cid:98) β from the routine and its objective functionvalue are stored.Step 3: Select m (say 10) returned vectors of (cid:98) β which result in the lowest m values among the n stored objective function values.Step 4: Denote the m selected vectors as (cid:98) β (1) , . . . , (cid:98) β ( m ) and use them as initials to restart theminimization routine individually, and update (cid:98) β (1) , . . . , (cid:98) β ( m ) with the newly returnedvectors respectively.Step 5: Repeat Step 4 a (say 5) times.Step 6: Calculate S T ( (cid:98) β ( i ) ) , i = 1 , . . . , m . And set the solution to be (cid:98) β = arg min i =1 ,...,m S T ( (cid:98) β ( i ) ).We implement the above estimation algorithm throughout this paper for CAViaR modelparameter estimations. There might be a concern if the artiﬁcial input of the initial values { f − i ( β τ ) , i = 1 , . . . , q } and { y − j , j = 1 , . . . , r } aﬀects the parameter estimator. In fact, theeﬀect usually is small and can be neglected when the sample size is large enough becausethe ﬁtted conditional quantiles { f t ( (cid:98) β ) } Tt =1 are kept close to the true ones { f t ( β ) } Tt =1 suchthat it can minimize the objective function despite some burn-in period. Consistency and asymptotic normality of CAViaR model parameters have been provedby Engle and Manganelli (2004). After regressing data onto a CAViaR model, we wouldlike to implement an inference testing on whether the model is correctly speciﬁed. In thissection we ﬁrst investigate how we result in the asymptotic normality of CAViaR modelparameter estimators. We focus on the elements of the asymptotic covariance matrix tohighlight their roles in connecting sample elements with the corresponding limit behaviours.Next, we check whether existing estimation strategies can perform robustly and satisfacto-rily for Wald tests on CAViaR models. Finally, we propose a new method called adaptiverandom bandwidth for CAViaR models.

Consider a time series { y t } of random variables y t on a complete prbability space (Ω , F , P ) .For applying a generic CAViaR model (1) on { y t } , the consistency and asymptotic normal-ity of the estimator (cid:98) β := arg min β T (cid:80) t =1 ρ τ ( y t − f t ( β )) has been derived out by Engle andManganelli (2004): Theorem 1 (Asymptotics given by Engle and Manganelli (2004))

For a data generating process { y t } with its time t conditional τ -th quantile following a generic See the assumption C0 of Engle and Manganelli (2004). We also apply this assumption throughoutthis paper. That is to say, all the random variables considered in this paper are assumed on a completeprbability space (Ω , F , P ). AViaR model as (1) parametrized by β o , it satisﬁes the regularity conditions (C0,. . . , C7,AN1,. . . , AN7) in the proof of Engle and Manganelli (2004). Then √ T A − / T D T (cid:16) (cid:98) β − β o (cid:17) D ∼ N ( , I ( p +1) × ( p +1) ) , (9) where (cid:98) β := arg min β ∈ R p +1 T (cid:88) t =1 ρ τ ( y t − f t ( β )) ,A T := E (cid:34) T − τ (1 − τ ) T (cid:88) t =1 ∇ (cid:48) f t ( β o ) ∇ f t ( β o ) (cid:35) ,D T := E (cid:34) T − T (cid:88) t =1 h t (0 |F t − ) ∇ (cid:48) f t ( β o ) ∇ f t ( β o ) (cid:35) ,(cid:15) τ t := y t − f t ( β o ) , (10) and h t (0 |F t − ) is denoted as the probability density of (cid:15) τ t evaluated at conditional on theinformation set F t − . I ( p +1) × ( p +1) is the ( p + 1) × ( p + 1) identity matrix. The above theorem is useful for quantile model (mis)speciﬁcation tests. For instance,Wald tests can be used to check whether the current model is correctly speciﬁed by testingthe validity of a more parsimonious nested model. To perform such a quantile model speci-ﬁcation test, it often requires to estimate A T , h t (0 |F t − ) and D T . When using traditionalestimates (cid:98) A T , { (cid:98) h t (0 |F t − ) } , (cid:98) D T of A T , { h t (0 |F t − ) } and D T respectively, we found consid-erable size distortions in inference tests on CAViaR models in general. We will show thatthe reason lies in the inaccuracy of { (cid:98) h t (0 |F t − ) } in the next subsection. In order to spot thediscrepancy in approximating { h t (0 |F t − ) } , we need a clear picture on how { h t (0 |F t − ) } comes up into the asymptotic normality of the model parameter estimator. Doing so, we cansee the role of { h t (0 |F t − ) } and whether a sequence (cid:110) (cid:98) h t (0 |F t − ) (cid:111) is capable to achieve thesame role in practice. Let us review the proof of Engle and Manganelli (2004) for Theorem 1below.The proof of Engle and Manganelli (2004) is obtained by applying Theorem 3 of Huberet al. (1967) onto T − / T (cid:88) t =1 (cid:16) (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) − τ (cid:17) ∇ (cid:48) f t ( (cid:98) β ) and the central limit theoremonto T − / T (cid:88) t =1 ( { y t ≤ f t ( β o ) } − τ ) ∇ (cid:48) f t ( β o ). Huber’s conditions are veriﬁed in the proofbefore applying Huber’s theorem. DenoteHit t ( β ) := { y t ≤ f t ( β ) } − τ,g t ( β ) := ∇ f t ( β ) . (11)Hit t ( β ) gives value − τ every time y t exceeds f t ( β ) and 1 − τ otherwise. With the trueunderlying parameter β o , { Hit t ( β o ) } is a martingale diﬀerence sequence with respect to {F t − } . It is easy to get that T − / T (cid:88) t =1 Hit t ( β o ) g t ( β o ) follows the central limit theorem8ecause { Hit t ( β o ) g t ( β o ) } is a martingale diﬀerence sequence with the assumption AN1of Engle and Manganelli (2004) on its uniformly bounded second moment. So we get that T − / T (cid:88) t =1 Hit t ( β o ) g t ( β o ) D ∼ N (0 , A T ) . (12)It has also been proved by Engle and Manganelli (2004) that T − / T (cid:88) t =1 Hit t ( (cid:98) β ) g t ( (cid:98) β ) = o p (1) . (13)Next, we are going to manifest { h t (0 |F t − ) } in the proof in a way which makes theappearance of { h t (0 |F t − ) } more intuitive. We rewrite Hit t ( (cid:98) β ) g t ( (cid:98) β ) as follows:Hit t ( (cid:98) β ) g t ( (cid:98) β ) = (cid:16) Hit t ( β o ) + Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17) g t ( (cid:98) β )= Hit t ( β o ) g t ( (cid:98) β ) + (cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17) g t ( (cid:98) β ) . (14)Take expectation on the both sides of Equation (14) and get T − / T (cid:88) t =1 E (cid:104) Hit t ( (cid:98) β ) g t ( (cid:98) β ) (cid:105) = T − / T (cid:88) t =1 E (cid:104) Hit t ( β o ) g t ( (cid:98) β ) + (cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17) g t ( (cid:98) β ) (cid:105) = T − / T (cid:88) t =1 E (cid:104) E [Hit t ( β o ) |F t − ] g t ( (cid:98) β ) (cid:105) + T − / T (cid:88) t =1 E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17) g t ( (cid:98) β ) (cid:105) = T − / T (cid:88) t =1 E (cid:104) E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17)(cid:12)(cid:12)(cid:12) F t − (cid:105) g t ( β o ) (cid:105) + T − / T (cid:88) t =1 E (cid:104) E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17)(cid:12)(cid:12)(cid:12) F t − (cid:105) (cid:16) g t ( (cid:98) β ) − g t ( β o ) (cid:17)(cid:105) = T − / T (cid:88) t =1 E (cid:104) E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17)(cid:12)(cid:12)(cid:12) F t − (cid:105) g t ( β o ) (cid:105) + T − / T (cid:88) t =1 E (cid:104) E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17)(cid:12)(cid:12)(cid:12) F t − (cid:105)(cid:105) O p ( (cid:107) (cid:98) β − β o (cid:107) ∞ ) , (15)9here (cid:107)·(cid:107) ∞ is the supremum norm of vectors. And E (cid:104)(cid:16) Hit t ( (cid:98) β ) − Hit t ( β o ) (cid:17)(cid:12)(cid:12)(cid:12) F t − (cid:105) = P (cid:110) y t ≤ f t ( (cid:98) β ) (cid:12)(cid:12)(cid:12) F t − (cid:111) − P { y t ≤ f t ( β o ) |F t − } = F t (cid:16) f t ( (cid:98) β ) (cid:12)(cid:12)(cid:12) F t − (cid:17) − F t ( f t ( β o ) |F t − )= F (cid:48) t ( f t ( β o ) |F t − ) (cid:16) f t ( (cid:98) β ) − f t ( β o ) (cid:17) + O p (cid:16) f t ( (cid:98) β ) − f t ( β o ) (cid:17) = h t (0 |F t − ) (cid:18)(cid:16) (cid:98) β − β o (cid:17) (cid:48) ∇ f t ( β o ) + O p ( (cid:107) (cid:98) β − β o (cid:107) ∞ ) (cid:19) + O p ( (cid:107) (cid:98) β − β o (cid:107) ∞ )= (cid:16) (cid:98) β − β o (cid:17) (cid:48) h t (0 |F t − ) ∇ f t ( β o ) + O p ( (cid:107) (cid:98) β − β o (cid:107) ∞ ) , (16)where F t ( ·|F t − ) is the probability density function of y t conditional on F t − , and h t (0 |F t − ) = F (cid:48) t ( f t ( β o ) |F t − ). Substituting (16) into (15) gives T − / T (cid:88) t =1 E (cid:104) Hit t ( (cid:98) β ) g t ( (cid:98) β ) (cid:105) = (cid:16) (cid:98) β − β o (cid:17) (cid:48) · T − / T (cid:88) t =1 E [ h t (0 |F t − ) ∇ f t ( β o ) ∇ (cid:48) f t ( β o )]+ O p ( T / (cid:107) (cid:98) β − β o (cid:107) ∞ ) . (17)Success in applying Huber’s theorem gives T − / T (cid:88) t =1 E (cid:104) Hit t ( (cid:98) β ) g t ( (cid:98) β ) (cid:105) = − T − / T (cid:88) t =1 ( { y t ≤ f t ( β o ) } − τ ) ∇ (cid:48) f t ( β o ) + o p (1) (18)Therefore, the asymptotic normality of T / (cid:16) (cid:98) β − β o (cid:17) is obtained by substituting (12) and(17) into (18).From the above derivation, it is clear that the role of h t (0 |F t − ) is actually an approximationto F (cid:48) t (cid:0) f t ( ¯ β ) (cid:12)(cid:12) F t − (cid:1) in which ¯ β is between β o and (cid:98) β . This role comes to the surface of (16)using the fact that F t (cid:16) f t ( (cid:98) β ) (cid:12)(cid:12)(cid:12) F t − (cid:17) − F t ( f t ( β o ) |F t − ) = F (cid:48) t (cid:0) f t ( ¯ β ) (cid:12)(cid:12) F t − (cid:1) (cid:16) ∇ (cid:48) f t ( β o ) (cid:16) (cid:98) β − β o (cid:17)(cid:17) (19)by the Mean Value Theorem. This approximating role of h t (0 |F t − ) sets a clear missionof any (cid:98) h t (0 |F t − ) supposed to achieve, which can be used to examine an estimator for h t (0 |F t − ) as well as to propose an improved estimation method. In next subsection, weare going to examine the performances of some existing methods for estimating h t (0 |F t − )and the role of h t (0 |F t − ) will help to ﬁnd out the intrinsic defects of those methods. Based on the literature on quantile regressions, in general there are two ways to estimate { h t (0 |F t − ) } in D T with { (cid:15) τ t } being potentially non-i.i.d.. One is referred to as the Hen- ricks Koenker Sandwich Approach (Hendricks and Koenker, 1992; Koenker, 2005) analo-gous to the ﬁnite diﬀerence idea resulting in the estimator (cid:98) h tfd (0 |F t − ) for h t (0 |F t − ) asfollows: (cid:98) h tfd (0 |F t − ) = 2 ∆ τ T f t ( β τ +∆ τ T ) − f t ( β τ − ∆ τ T ) , (20)where ∆ τ T is subject to 0 < τ ± ∆ τ T < τ T → T → ∞ . The other oneis referred to as the Powell Sandwich (Powell, 1991; Koenker, 2005) based on the kerneldensity estimation idea resulting in the estimator (cid:98) h tkernel (0 |F t − ) for h t (0 |F t − ) as follows: (cid:98) h tkernel (0 |F t − ) = P { y t ≤ f t ( β τ ) + c T |F t − } − P { y t ≤ f t ( β τ ) − c T |F t − } c T ≈ c T K (cid:18) y t − f t ( β τ )2 c T (cid:19) (21)where K ( · ) is a suitable kernel function with bandwidth 2 c T and c T → T → ∞ . As wecan see in (21), one kernel function is applied throughout { y t } with y t − f t ( β τ ) being theonly distinguishable information for (cid:98) h tkernel (0 |F t − ). Therefore, this kernel method doesnot capture suﬃcient information to distinguish time-varying conditional distributions of { y t } , and consequently cannot fully adapt to the time-variations. Additionally, the choiceof the kernel function K ( · ) and the bandwidth parameter c T are still in a lot of nettlesomequestions in practice. A similar issue in the Hendricks Koenker Sandwich Approach is onchoosing ∆ τ T and extra error resulted from estimating f t ( β τ +∆ τ T ) and f t ( β τ − ∆ τ T ).The estimation method adopted by Engle and Manganelli (2004) is a form of the PowellSandwich as follows: (cid:98) h tker (0 |F t − ) = {| y t − f t ( (cid:98) β τ ) | < (cid:98) c T } c T (22)As suggested by Koenker (2005) and Machado and Silva (2013), the bandwidth (cid:98) c T generallyadopted is deﬁned as follows: (cid:98) c T = (cid:98) k T (cid:2) Φ − ( τ + m T ) − Φ − ( τ − m T ) (cid:3) , (23)where m T is deﬁned as (cid:98) m T = T − (cid:18) Φ − (1 − .

052 ) (cid:19) (cid:18) . φ (Φ − ( τ ))) − ( τ )) + 1 (cid:19) , (24)with Φ( · ) and φ ( · ) being the cumulative distribution and probability density functions of N (0 ,

1) respectively. And (cid:98) k T is deﬁned as the median absolute deviation of the conditional τ -th quantile regression residuals.Wald tests are applied in this subsection to check the performances of the above estima-tion methods for CAViaR models.First, we consider the following candidate model speciﬁcations for the conditional τ -th( τ ∈ (0 , { y t } with f t ( β τ ) denoted as the τ -th quantile of y t F t − .  • full speciﬁcation: f t ( β F Mτ ) = β F M ( τ )+ β F M f t − ( β F Mτ )+ β F M ( y t − ) + + β F M ( y t − ) − , (25)where the operators ( · ) + and ( · ) − are deﬁned as ( x ) + =max( x, , ( x ) − = − min( x, • restrictive model 1: f t ( β R τ ) = β ( τ ) R + β R f t − ( β R τ ) + β R | y t − | = β ( τ ) R + β R f t − ( β R τ ) + β R ( y t − ) + + β R ( y t − ) − . (26) • restrictive model 2: f t ( β R τ ) = β R ( τ ) + β R f t − ( β R τ ) + β R y t − = β R ( τ ) + β R f t − ( β R τ ) + β R ( y t − ) + − β R ( y t − ) − . (27)The models (26) and (27) are nested within model (25). Now let us consider the Waldtest on models (25) and (26) ﬁrst. Simulate a time series { y t } with its DGP speciﬁed asthe model (26) with the underlying parameter vector β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , where { u t } i.i.d. ∼ U (0 ,

1) and F − N (0 , ( · ) is the inverse standard normal probability distributionfunction. The sample size of each simulated sample is 4000. Conditional 50%-th quantilesare estimated for each of total 1000 simulated samples in this DGP by regressing the sampleonto the full model (25). The Wald test implemented here consists of the null hypothesis ofthe form H : R β F Mτ = γ , where R = [0 , , , − γ = 0, and (cid:98) β τ is the estimator of the fullmodel parameter vector in (25). The Wald test statistic denoted by W T is formulated (Weiss,1991) as follows: W T = T (cid:16) R (cid:98) β τ − γ (cid:17) (cid:48) (cid:104) R (cid:98) D − T (cid:98) A T (cid:98) D − T R (cid:48) (cid:105) − (cid:16) R (cid:98) β τ − γ (cid:17) , (28)where (cid:98) A T and (cid:98) D T are estimates for A T and D T in (10) respectively. It is straightforwardto obtain (cid:98) A T and (cid:98) D T by plugging in (cid:98) β τ and (cid:110) (cid:98) h t (0 |F t − ) (cid:111) , i.e.,  (cid:98) A T = T − τ (1 − τ ) T (cid:88) t =1 ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) , (cid:98) D T = T − T (cid:88) t =1 (cid:98) h t (0 |F t − ) ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) . Notations on (cid:98) D T to distinguish diﬀerent estimators used for { h t (0 |F t − ) } are given by (cid:98) D kerT = (2 T (cid:98) c T ) − T (cid:88) t =1 {| y t − f t ( (cid:98) β τ ) | < (cid:98) c T } ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) , (29)12 D fdT = ( T ) − T (cid:88) t =1 τ T f t ( β τ +∆ τ T ) − f t ( β τ − ∆ τ T ) ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) , (30)where (cid:98) c T is determined as (23).We are going to examine each element in the estimation of D T . The analytic solutionto h t (0 |F t − ) can be obtained as follows: h t (0 |F t − ) = ∂τ∂f t ( β τ ) = 1 ∂β ( τ ) ∂τ + β ∂f t − ( β τ ) ∂τ = 1 ∂β ( τ ) ∂τ n (cid:88) i =0 β i + β n +11 ∂f t − n − ( β τ ) ∂τ = (1 − β ) 1 β (cid:48) ( τ ) (31)where β (cid:48) ( τ ) := ∂β ( τ ) ∂τ . The last line is obtained by knowing | β | <

1. The analytic solutionto h t (0 |F t − ) is used to help identify inaccurate elements in (cid:98) D T by comparing the testperformances of using (cid:98) D kerT , (cid:98) D fdT and the following (cid:98) D h T = ( T ) − T (cid:88) t =1 (1 − β ) 1 β (cid:48) ( τ ) ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) . (32)The test performances of using (cid:98) D kerT , (cid:98) D fdT and (cid:98) D h T are shown in Table 1 and 2, which arecompared together with the Wald test result using the true underlying parameter vector β F Mτ = [ F − N (0 , ( τ ) , . , . (cid:48) , τ ∈ (0 ,

1) into (cid:98) D T = ( T ) − T (cid:88) t =1 (1 − β ) 1 β (cid:48) ( τ ) ∇ (cid:48) f t ( β τ ) ∇ f t ( β τ )= (1 − . φ ( F − N (0 , ( τ )) T T (cid:88) t =1 ∇ (cid:48) f t ( β oτ ) ∇ f t ( β oτ ) , (33)where φ ( · ) is the probability density function of N (0 , D T estimators are listed in Table 1 in which each estimated size is obtained by the percentagerejection rate among the 1000 samples of T = 4000 in the DGP (26). Analogously, weimplement the Wald test on models (25) and (27) with the underlying DGP { y t } speciﬁedas the model (27) with the underlying parameter vector β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , where { u t } i.i.d. ∼ U (0 , H : R β F Mτ = γ , where R = [0 , , , γ = 0,and (cid:98) β τ is the estimator of the full model regression (25). In result, the size performancesof the Wald tests on (25) and (27) are listed in Table 2.From Table 1 and 2, we can see large size distortions with (cid:98) D fdT , unlike (cid:98) D kerT , (cid:98) D h T or (cid:98) D T that are performing in line with the nominal size. This comparison points out the13rucial element estimation to the accuracy of (cid:98) D T which is (cid:110) (cid:98) h t (0 |F t − ) (cid:111) . To check whether { (cid:98) h tker (0 |F t − ) } is capable to achieve the role of { h t (0 |F t − ) } robustly for time-varyingconditional probability densities, we consider the following DGP: y t = f t ( β R u t ) = β R ( u t ) (cid:113) ( y t − ) + + β R f t − ( β R τ ) + β R | y t − | (34)= β R ( u t ) (cid:113) ( y t − ) + + β R f t − ( β R τ ) + β R ( y t − ) + + β R ( y t − ) − , (35)where { u t } i.i.d. ∼ U (0 ,

1) and the underlying parameters are given as β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) .The analytic form of the corresponding conditional probability density h t (0 |F t − ) of y t atits τ -th quantile f t ( β τ ) given F t − can be derived out as follows: h t (0 |F t − ) = (cid:18) ∂f t ( β τ ) ∂τ (cid:19) − = (cid:18) ∂β ( τ ) ∂τ (cid:113) ( y t − ) + + β ∂f t − ( β u t ) ∂τ (cid:19) − = (cid:32) ∂β ( τ ) ∂τ ∞ (cid:88) i =1 β i − (cid:113) ( y t − i ) + (cid:33) − , (36)where the ﬁrst equation is obtained by iteratively rewriting ∂f t − i ( β τ ) ∂τ at each i and knowing | β | <

1. This analytic form of h t (0 |F t − ) in (34) shows that { h t (0 |F t − ) } indeed is time-varying and nonzero with probability one.We simulate 1000 samples from the DGP (34) with T = 5000, and estimate the condi-tional 50%-th quantiles of each sample by regressing the sample onto the full model speci-ﬁcation (35). The Wald test described as (28) with R = [0 , , , −

1] is performed on these1000 samples and the size performance is presented in Table 3. We see a large size distortionwith the kernel method (cid:98) D kerT in Table 3. More tests are conducted for diﬀerent DGPs andtogether with the results are presented in Appendix E. Based on our test results, we seethat the kernel method for estimating { h t (0 |F t − ) } is not robust and cannot fully adapt totime-varying conditional probability densities.Estimating { h t (0 |F t − ) } robustly has to be achieved in order to ensure the reliabilityof CAViaR analysis based on the asymptotic properties of CAViaR model parameter es-timators. In seeking for improving the accuracy of (cid:110) (cid:98) h t (0 |F t − ) (cid:111) , we bear in mind twoguidances. One is the role of { h t (0 |F t − ) } on how it links sample elements with the cor-responding limit behaviours, see Section 3.1. The other guidance is the fundamental ﬂawsof { (cid:98) h tker (0 |F t − ) } and { (cid:98) h tfd (0 |F t − ) } in their accuracy. In terms of { (cid:98) h tfd (0 |F t − ) } , ∆ τ T needs to be determined properly and two more quantile regressions need to be preformedin order to obtain (cid:98) β τ +∆ τ T and (cid:98) β τ − ∆ τ T . The eﬀect of this extra estimation error is cru-cial to the performance of { (cid:98) h tfd (0 |F t − ) } . Although { (cid:98) h tker (0 |F t − ) } does not need extraquantile regressions, it still requires a proper choice on the kernel function K ( · ) and thebandwidth (cid:98) c T . Remarkably, { (cid:98) h tker (0 |F t − ) } does not diﬀerentiate the observations withinthe bandwidth regardless of the number of the observations in the bandwidth while usingthe kernel function {| y t − f t ( (cid:98) β τ ) | < (cid:98) c T } . Therefore, it is desirable to get rid of choosingbandwidth ∆ τ T or c T and the kernel function K ( · ) in the estimation. In the next subsection,a robust estimation method for { h t (0 |F t − ) } is developed up without the need in choosinga bandwidth or a kernel function. 14 .3 Adaptive random bandwidth method We have noticed that the accuracy of the { h t (0 |F t − ) } estimation is crucial to the perfor-mance of inference tests based on the asymptotic normality of CAViaR model parameterestimators. It is also well known that { (cid:98) h tfd (0 |F t − ) } suﬀers both from the error in esti-mating f t ( β τ +∆ τ ) and f t ( β τ − ∆ τ ) and from choosing a proper ∆ τ T . On the other hand, { (cid:98) h tker (0 |F t − ) } has some fundamental problems. First of all, { (cid:98) h tker (0 |F t − ) } cannot fullyadapt to time-varying conditional distributions of time series due to the fact that the samekernel function K ( · ) and only timely information ( y t − f t ) are used in estimating h t (0 |F t − )for all t. Second, ﬁnding a proper kernel function K ( · ) with a proper bandwidth c T stillfaces a lot nettlesome problems in practice. Neither of these two methods is practicallyrobust. The goal in this subsection is to develop an estimation method for { h t (0 |F t − ) } which can adapt to time-variation characteristics of CAViaR DGPs and is robust in practicewithout the need to determine a proper bandwidth. We name this estimation method as the adaptive random bandwidth (ARB) method which can reliably bridge asymptotic propertiesof CAViaR models in theory with CAViaR applications.The idea of this method is inspired by viewing the role of { h t (0 |F t − ) } on how it linkssample elements with the corresponding limit behaviours, see Section 3.1. Reviewing equa-tion (16), we can explicitly formulate { h t (0 |F t − ) } as follows: h t (0 |F t − ) = E y t , (cid:98) β  Hit t ( (cid:98) β ) − Hit t ( β o ) ∇ (cid:48) f t ( β o ) (cid:16) (cid:98) β − β o (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  , (37)which actually is a conditional expectation taken with respect to random variables y t and (cid:98) β .We use the subscript in E to clarify the expectation is taken with respect to speciﬁc randomvariable(s) hereafter. Considering this role of { h t (0 |F t − ) } as well as equation (19), weare enlightened to use random bandwidth ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) with √ T (cid:16) b i − (cid:98) β (cid:17) D ∼ N ( , V d )and i = 1 , , . . . , n . We can set V d = I ( p +1) × ( p +1) to start with. After suﬃcient n timesMonte Carlo simulating b i − (cid:98) β from N ( , V d ), an estimator of h t (0 |F t − ) can be achievedas follows: (cid:98) h t (0 |F t − ) := n − n (cid:88) i =1 (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) −∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) . (38)After achieving the above (cid:98) h t (0 |F t − ), we can estimate (cid:98) D T so as to update V d = (cid:98) D − T (cid:98) A T (cid:98) D − T .Redo the simulation of { b i − (cid:98) β } ni =1 with the updated V d . We can estimate (cid:98) h t (0 |F t − ) and (cid:98) D T again. This estimation repetition can mitigate the inﬂuence of an arbitrary chosen (cid:98) h t (0 |F t − ) in ARB.Compared to the Powell Sandwich estimation (21) with c T , our proposed method usesrandom bandwidth ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) and Monte Carlo simulations such that it can adaptto time-varying conditional distributions of CAViaR DGPs by approaching to the role of { h t (0 |F t − ) } as in (19) and in (37). The adaptive random bandwidth method can remark-ably outperform the Powell Sandwich method in the applications on DGPs of time-varyingconditional distributions, as shown in Table 3. In theory, the adaptive random bandwidthmethod is valid as long as b i − (cid:98) β and β o − (cid:98) β have the same order of magnitude. We formallyestablish this adaptive random bandwidth method in Theorem 2.15 heorem 2 (Adaptive Random Bandwidth Method) Assume the conditions and the asymptotic normality result in Theorem 1. Choose an arbi-trary positive deﬁnite symmetric matrix V d . Under the condition that √ T (cid:16) b i − (cid:98) β (cid:17) i.i.d. ∼ N ( , V d ) , i = 1 , . . . , n, (39) and (cid:12)(cid:12)(cid:12)(cid:12) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) = 0 , the adaptive random bandwidth estimator for h t (0 |F t − ) is formulated as follows: (cid:98) h t (0 |F t − ) =  n − n (cid:88) i =1 (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) , when y t (cid:54) = f t ( (cid:98) β ) , , when y t = f t ( (cid:98) β ) , (40) such that E y t , (cid:98) β (cid:20) (cid:98) h t (0 |F t − ) (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:21) p −→ h t (0 |F t − ) as n → ∞ . Proof.

See Appendix C.We separate the case of y t = f t ( (cid:98) β ) from others to maintain the convergence of the ARBestimator due to lim x → x = ∞ . Zero given to (cid:98) h t (0 |F t − ) at y t = f t ( (cid:98) β ) also enables theARB estimator to approximate h t (0 |F t − ) from the left and from the right in half weightsrespectively in expectation, see the proof of Theorem 2. The convergence property of thepartial sum in the sequence { (cid:98) h t (0 |F t − ) } by ARB is given in Corollary 3. Corollary 3

Under the conditions of Theorem 2, the adaptive random bandwidth estimator { (cid:98) h t (0 |F t − ) } has the following property: T T (cid:88) t =1 (cid:98) h t (0 |F t − ) m.s. −→ T T (cid:88) t =1 h t (0 |F t − ) , (41) as T, n → ∞ . Proof.

See Appendix C.It is clear that both (cid:98) (cid:15) t := y t − f t ( (cid:98) β ) and ∇ (cid:48) f t ( (cid:98) β ) are taken into account by ARB toapproximate h t (0 |F t − ). In order to identify how (cid:98) (cid:15) t and ∇ (cid:48) f t ( (cid:98) β ) jointly shape (cid:98) h t (0 |F t − ),we would like to formulate (cid:98) h t (0 |F t − ) in Theorem 2 into an analytic expression in terms of (cid:98) (cid:15) t and ∇ (cid:48) f t ( (cid:98) β ) so as to manifest the relationship. The analytic form of (cid:98) h t (0 |F t − ) by ARBdescribed in Theorem 2 is presented in Corollary 4. We regard the least ( p + 1) absolute residuals in {| y t − f t ( (cid:98) β ) |} Tt =1 as zeros. In fact, iterations ofa simplex-based direct search method like the Nelder–Mead method for optimizing ( p + 1) parametersterminates at the vertices of a simplex in the parameter space (Lagarias et al., 1998). That is to say, theiterations in optimizing the τ -th quantile regression objective function terminate with ( p + 1) elements of { ( τ − { y t − f t ( β ) < } )( y t − f t ( β )) } solved to be zeros. Therefore, we set (cid:98) h t (0 |F t − ) = 0 at the least( p + 1) absolute residuals in {| y t − f t ( (cid:98) β ) |} Tt =1 in all the tests throughout this paper. orollary 4 Under the conditions of Theorem 2, we can get the analytic form of (cid:98) h t (0 |F t − ) as follows: (cid:98) h t (0 |F t − ) =  δ ∇ t √ π E (cid:18) (cid:98) (cid:15) t δ ∇ t (cid:19) , when (cid:98) (cid:15) t (cid:54) = 0 , , when (cid:98) (cid:15) t = 0 , (42) where (cid:98) (cid:15) t := y t − f t ( (cid:98) β ) , δ ∇ := (cid:113) ∇ (cid:48) f t ( (cid:98) β ) V d ∇ f t ( (cid:98) β ) T = T − (cid:107)∇ f t (cid:107) , and E ( s ) := (cid:82) ∞ s x − e − x d x is a special integral known as the exponential integral or the incomplete gamma function Γ(0 , s ) . Proof.

See Appendix C.For visually checking the roles of (cid:98) (cid:15) t and δ ∇ t in the analytic (cid:98) h t (0 |F t − ) in Corollary 4, wepresent a level plot of the analytic (cid:98) h t (0 |F t − ) over (cid:98) (cid:15) t and δ ∇ t in Figure 2 which uses colorsto diﬀerentiate diﬀerent ranges of (cid:98) h t (0 |F t − ). It is straightforward to get that the analytic (cid:98) h t (0 |F t − ) is decreasing in | (cid:98) (cid:15) t | as also shown in Figure 2. However, δ ∇ t , or say T − (cid:107)∇ f t (cid:107) ,can shift (cid:98) h t (0 |F t − ) by reﬂecting on how rare an (cid:98) (cid:15) t is observed given the information set F t − and the model speciﬁcation. That is how the information of δ ∇ t in ARB shapes (cid:98) h t (0 |F t − )adaptively to time-varying conditional probability densities.Figure 2: Level plot for the analytic form of (cid:98) h t (0 |F t − ) by ARB in Corollary 4.The ARB estimator (cid:98) h t (0 |F t − ) via simulations in Theorem 2 performs as robustly asthe analytic ARB estimator (cid:98) h t (0 |F t − ) in Corollary 4, as shown in Table 1, 2 and 3. Theanalytic way is faster than the simulation one. However, the ARB estimator via simulationsis more intuitive and more ﬂexible to adapt to a very diﬀerent distribution for simulating { b i − (cid:98) β } ni =1 . 17able 1: The size performances of the Wald test on the restricted model (26) to (25) withdiﬀerent estimation methods for { h t (0 |F t − ) } ( β R . = [0 , . , . , . , R = [0 , , , − , T =4000) Tests size: α = 0 . α = 0 . α = 0 . α = 0 . (cid:98) D T (cid:98) D h T (cid:98) D arbT ( n = 10 , V d = I × with no update) 0.012 0.052 0.098 0.196Using (cid:98) D arbT (analytic, V d = I × with no update) 0.012 0.052 0.102 0.198Using (cid:98) D arbT ( n = 10 , 2 times updating V d ) 0.014 0.062 0.126 0.221Using (cid:98) D arbT (analytic, 2 times updating V d ) 0.014 0.061 0.125 0.219Using (cid:98) D fdT (∆ τ T = T ) 0.080 0.150 0.201 0.272Using (cid:98) D kerT D T need to be estimated consistently for inference tests on CAViaR models based on theasymptotic normality of the model parameter estimator. { (cid:98) h t (0 |F t − ) } by ARB facilitatesour estimation on D T by just plugging in (cid:98) β and { (cid:98) h t (0 |F t − ) } . The resulted estimator (cid:98) D arbT has the consistency property presented in Theorem 5. Theorem 5

Under the conditions of Theorem 2, we can get that (cid:98) D arbT p −→ D T , (43) as T → ∞ and n → ∞ , where (cid:98) D arbT := T − T (cid:88) t =1 (cid:98) h t (0 |F t − ) ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) and (cid:98) h t (0 |F t − ) is the adaptive random bandwidth estimator shown in (40) . Proof.

See Appendix C.The adaptive random bandwidth (ARB) method is intuitive, robust and simple in prac-tice, which can adapt to time-varying conditional distributions without a speciﬁc bandwidthor kernel function. A comparison of size performances of Wald tests using ARB with othercompeting methods are presented in Tables 1, 2 and 3. We also ﬁnd that updating V d improves the size performance with use of α levels in the interquartile range around but notmuch for α levels like 1% , We study four US stock prices which are the Dow Jones Composite Average (DJCA), theNASDAQ 100 Index (NASDAQ100), the S&P 500, and the Wilshire 5000 Total MarketIndex (Will5000ind). We implement inference tests using the adaptive random bandwidth method with n = 1000 and V d = I ( p +1) × ( p +1) which is not updated in simulations in this18able 2: The size performances of the Wald test on the restricted model (27) to (25) withdiﬀerent estimation methods for { h t (0 |F t − ) } ( β R . = [0 , . , . , − . , R = [0 , , , , T =4000) Tests size: α = 0 . α = 0 . α = 0 . α = 0 . (cid:98) D T (cid:98) D h T (cid:98) D arbT ( n = 10 , V d = I × with no update) 0.01 0.046 0.084 0.168Using (cid:98) D arbT (analytic, V d = I × with no update) 0.009 0.044 0.083 0.168Using (cid:98) D arbT ( n = 10 , 2 times updating V d ) 0.011 0.049 0.098 0.192Using (cid:98) D arbT (analytic, 2 times updating V d ) 0.01 0.048 0.097 0.19Using (cid:98) D fdT (∆ τ T = T ) 0.049 0.104 0.153 0.229Using (cid:98) D kerT Table 3: The size performances of the Wald test on the restricted model (34) to (35) withdiﬀerent estimation methods for { h t (0 |F t − ) } ( β R . = [0 , . , . , . , R = [0 , , , − , T =2000) Tests size: α = 0 . α = 0 . α = 0 . α = 0 . (cid:98) D arbT ( n = 10 , V d = I × with no update) 0.024 0.052 0.095 0.169Using (cid:98) D arbT (analytic, V d = I × with no update) 0.023 0.054 0.093 0.168Using (cid:98) D arbT ( n = 10 , 2 times updating V d ) 0.021 0.055 0.095 0.188Using (cid:98) D arbT (analytic, 2 times updating V d ) 0.022 0.055 0.098 0.186Using (cid:98) D kerT section. Each stock price time series has 2448 daily prices, ranging from 8th April 2010 to30th December 2019. The price data were converted to return rates by multiplying 100 withthe diﬀerence of the natural logarithm of the daily prices. The obtained return time seriesof each stock contains 2447 observations which of the last 400 observations are used for theout-of-sample testing after the ﬁrst 2047 observations are used to estimate the model.The 5% 1-day VaRs of a return time series are the opposite conditional 5% 1-day quantilesof this time series. There are four diﬀerent CAViaR models considered in this sectionto model the conditional quantiles of the stock return time series. The 5% 1-day VaRsare estimated via the four diﬀerent CAViaR speciﬁcations and the estimation results areshown in Table 4, 5, 6 and 7 respectively. Each table contains the estimated parametersin a speciﬁed model, the corresponding standard errors obtained by the adaptive randombandwidth method with n = 1000 and V d = I ( p +1) × ( p +1) , the resulted two-sided p-valueson parameter signiﬁcance, the optimized value of the quantile regression objective function(RQ), the percentage of times the VaR is exceeded, and the p-values of dynamic quantile(DQ) tests, both in-sample and out-of-sample. The model estimations, the in-sample DQtests as well as the out-of-sample DQ tests in this empirical study are set up in the same19ay of Section 6 of Engle and Manganelli (2004).  • Adaptive CAViaR: f t ( β ) = f t − ( β ) + β (cid:110) [1 + exp ( G [ y t − − f t − ( β ))])] − − τ (cid:111) , (44)where τ is the quantile index of interest. • Symmetric absolute value CAViaR: f t ( β ) = β + β f t − ( β ) + β | y t − | . (45) • Asymmetric slope CAViaR: f t ( β ) = β + β f t − ( β ) + β ( y t − ) + + β ( y t − ) − . (46) • Indirect GARCH(1 , f t ( β ) = − (cid:113) β + β f t − ( β ) + β y t − . (47)The above four CAViaR speciﬁcations have been deﬁned as the adaptive CAViaR, the sym-metric absolute value CAViaR, the asymmetric slope CAViaR, and the indirect GARCH(1 , G = 10.Comparing with the results in Section 6 of Engle and Manganelli (2004), we can seethe standard errors obtained by the adaptive random bandwidth method is much smallerrelatively to the size of estimated parameters. We use signiﬁcance level 5% to reject aparameter equal to zero as well as DQ tests. “ * ” denotes the rejections in Table 4, 5, 6and 7. Each of the four models shows almost the same rejection results for the stock returntime series. Remarkably, it is observed that the coeﬃcient β of the VaR autoregressive termis highly signiﬁcant from zero in all the four models for each stock return time series. Thisfurther supports the standpoint of CAViaR speciﬁcations, conﬁrming that the phenomenonof volatility clustering can be associated with the autoregressive VaR behaviour. The VaRexceedance in percentage indicates the realized risk level in applications. Dynamic quantile(DQ) tests based on the independence information regarding { Hit t } are used to test modelmisspeciﬁcation. We see a rejection in the in-sample DQ test on the symmetric absolutevalue model for the S&P500 but the realized VaR exceedances (in-sample and out-of-sample)are much close to 5% in Table 5. So it can be complementary to judge CAViaR modelspeciﬁcations by looking at both VaR exceedances and inference tests like DQ tests.In contract to the signiﬁcance of β , the coeﬃcient β of ( y t − ) is insigniﬁcant inthe indirect GARCH(1,1) model for all the stock return time series, see Table 6. And thecoeﬃcient β of ( y t − ) + is insigniﬁcant in the asymmetric slope model, see Table 4. Althoughthe coeﬃcient of y t − is signiﬁcant in the symmetric absolute model for all the stock returntime series (see Table 5), it is mainly due to the signiﬁcant explanatory role of ( y t − ) − basedon the results of the asymmetric slope model which the symmetric absolute model is nested20able 4: The Asymmetric Slope Model ( τ = 0 . Stock Name DJCA NASDAQ100 S&P500 Will5000indˆ β -0.0538 -0.1366 -0.0772 -0.0803s.e.( ˆ β ) 0.0192 0.0324 0.0283 0.0259p-value( ˆ β ) 0.0051* 0.0000* 0.0063* 0.0019*ˆ β β ) 0.0276 0.0356 0.0344 0.0344p-value( ˆ β ) 0.0000* 0.0000* 0.0000* 0.0000*ˆ β -0.0175 0.0381 0.0264 0.0158s.e.( ˆ β ) 0.0325 0.0717 0.0732 0.0831p-value( ˆ β ) 0.5918 0.5950 0.7179 0.8487ˆ β -0.3069 -0.3626 -0.4249 -0.4226s.e.( ˆ β ) 0.0667 0.0673 0.1214 0.1153p-value( ˆ β ) 0.0000* 0.0000* 0.0005* 0.0002*RQ 205.1100 253.9000 215.3800 219.3200Exceedance in-sample ( %) 5.0166 5.0639 5.0166 5.0166Exceedance out-of-sample % 4.7326 4.8746 4.5906 4.5433DQ in-sample (p value) 0.4306 0.5140 0.3094 0.4425DQ out-of-sample (p value) 1.0000 1.0000 1.0000 1.0000 Table 5: The Symmetric Absolute Value Model ( τ = 0 . Stock Name DJCA NASDAQ100 S&P500 Will5000indˆ β -0.0507 -0.1310 -0.0544 -0.0521s.e.( ˆ β ) 0.0405 0.0641 0.0430 0.0315p-value( ˆ β ) 0.2103 0.0410* 0.2064 0.0984ˆ β β ) 0.0418 0.0629 0.0544 0.0324p-value( ˆ β ) 0.0000* 0.0000* 0.0000* 0.0000*ˆ β -0.2375 -0.2492 -0.2485 -0.2161s.e.( ˆ β ) 0.0266 0.0785 0.0775 0.0311p-value( ˆ β ) 0.0000* 0.0015* 0.0013* 0.0000*RQ 210.7300 263.0400 223.5300 227.3200Exceedance in-sample (%) 5.0166 5.0166 5.0166 5.0166Exceedance out-of-sample (%) 5.3952 5.2532 4.9219 4.9692DQ in-sample (p value) 0.2306 0.3548 0.0470* 0.1537DQ out-of-sample (p value) 1.0000 1.0000 1.0000 1.0000 ,

1) ( τ = 0 . Stock Name DJCA NASDAQ100 S&P500 Will5000indˆ β β ) 0.0325 0.1069 0.0384 0.0414p-value( ˆ β ) 0.0450* 0.0442* 0.0223* 0.0670ˆ β β ) 0.0247 0.0444 0.0261 0.0258p-value( ˆ β ) 0.0000* 0.0000* 0.0000* 0.0000*ˆ β β ) 0.2169 0.2031 0.2096 0.2041p-value( ˆ β ) 0.2395 0.0631 0.0826 0.1465RQ 209.4600 262.4600 222.1100 226.5200Exceedance in-sample (%) 4.9692 5.0166 5.0639 5.0639Exceedance out-of-sample (%) 5.3005 5.2059 4.6853 4.8273DQ in-sample (p value) 0.3678 0.4108 0.2887 0.4216DQ out-of-sample (p value) 1.0000 1.0000 1.0000 1.0000 Table 7: The Adaptive Model ( τ = 0 . Stock Name DJCA NASDAQ100 S&P500 Will5000indˆ β -0.6980 -0.7027 -0.9827 -1.5480s.e.( ˆ β ) 0.0768 0.0760 0.0520 0.0014p-value( ˆ β ) 0.0000* 0.0000* 0.0000* 0.0000*RQ 213.4500 272.7100 226.9600 231.9700Exceedance in-sample ( %) 4.4487 4.8746 4.6380 4.3067Exceedance out-of-sample % 4.7799 5.1585 4.8746 4.4960DQ in-sample (p value) 0.6518 0.9802 0.9545 0.2118DQ out-of-sample (p value) 1.0000 1.0000 1.0000 1.0000 β in the adaptive model for each stock return time seriessuggest that the 5% 1-day VaR can be associated with its 1-day lagged VaR violation whichequals one if y t − ≤ f t − and zero otherwise. The signiﬁcance results together implies thatnegative movements of a stock is signiﬁcantly inﬂuential on its 5% 1-day VaR in the nextday.In terms of the model goodness of ﬁt, we look at the RQ results. The asymmetric slopemodel presents the lowest RQ result for each stock return time series among the four modelsdespite that it has the most coeﬃcients.Overall, all the four stock return time series present the same strong associations withthe lagged 5% 1-day VaR in interpreting the present 5% 1-day VaR. The asymmetric slopemodel and the adaptive CAViaR are satisfying for all the four stock returns in terms of datainterpretation and model performance concerns. We found that the inference test performance in CAViaR models is not robust and unsat-isfying due to the estimation of the conditional probability densities of time series. Wefound that the existing density estimation methods cannot fully adapt to time-varying con-ditional probability densities of CAViaR time series. So in this paper we have developed amethod called adaptive random bandwidth which can robustly approximate the time-varyingconditional probability densities of CAViaR time series by Monte Carlo simulations. Thismethod not only avoids the haunting problem of choosing an optimal bandwidth but alsoensures the reliability of CAViaR analysis based on the asymptotic normality of the modelparameter estimator. In theory, our proposed method can be extended to general quantileregressions including multivariate cases easily and robustly. This method also has the po-tential to achieve the second-order accuracy to Wald tests of nonlinear restrictions (Phillipsand Park, 1988; de Paula Ferrari and Cribari-Neto, 1993) in quantile regressions.23 eferences de Paula Ferrari, S. L. and Cribari-Neto, F. (1993). On the corrections to the wald test ofnon-linear restrictions.

Economics Letters , 42(4):321–326.Duﬃe, D. and Pan, J. (1997). An overview of value at risk.

Journal of derivatives , 4(3):7–49.Engle, R. F. and Manganelli, S. (2004). Caviar: Conditional autoregressive value at risk byregression quantiles.

Journal of Business & Economic Statistics , 22(4):367–381.Hecq, A. and Sun, L. (2020). Selecting between causal and noncausal models with quantileautoregressions.

Studies in Nonlinear Dynamics & Econometrics , 1(ahead-of-print).Hendricks, W. and Koenker, R. (1992). Hierarchical spline models for conditional quan-tiles and the demand for electricity.

Journal of the American statistical Association ,87(417):58–68.Huber, P. J. et al. (1967). The behavior of maximum likelihood estimates under nonstandardconditions. In

Proceedings of the ﬁfth Berkeley symposium on mathematical statistics andprobability , volume 1, pages 221–233. University of California Press.Koenker, R. (2005).

Quantile regression . Cambridge University Press.Koenker, R. and Xiao, Z. (2006). Quantile autoregression.

Journal of the American Statis-tical Association , 101(475):980–990.Lagarias, J. C., Reeds, J. A., Wright, M. H., and Wright, P. E. (1998). Convergenceproperties of the nelder–mead simplex method in low dimensions.

SIAM Journal onoptimization , 9(1):112–147.Machado, J. A. and Silva, J. (2013). Quantile regression and heteroskedasticity. https://jmcss. som. surrey. ac. uk/JM JSS. pdf. Accessed , 5(7):2015.Phillips, P. C. and Park, J. Y. (1988). On the formulation of wald tests of nonlinearrestrictions.

Econometrica: Journal of the Econometric Society , pages 1065–1083.Powell, J. L. (1991). Estimation of monotonic regression models under quantile restrictions.

Nonparametric and semiparametric methods in Econometrics , pages 357–384.Storn, R. and Price, K. (1997). Diﬀerential evolution–a simple and eﬃcient heuristic forglobal optimization over continuous spaces.

Journal of global optimization , 11(4):341–359.Weiss, A. A. (1991). Estimating nonlinear dynamic models using least absolute error esti-mation.

Econometric Theory , 7(1):46–68.White, H. (2014).

Asymptotic theory for econometricians . Academic press.24 ppendix A Nonlinearity of parameters in CAViaR mod-els

Nonlinearity of parameters in CAViaR models diﬀerentiates CAViaR from linear quantileregressive models. In this appendix, we would like to illustrate the nonlinearity explicitlyby showing the gradient, and the Hessian matrix of a CAViaR model. f t ( β τ ) = β ( τ ) + β f t − ( β τ ) + β ( y t − ) + + β ( y t − ) − , (48)where τ ∈ (0 , · ) + and ( · ) − are deﬁned as ( x ) + = max( x, , ( x ) − = − min( x, f t ( β τ ) = β ( τ ) + β f t − ( β τ ) + β ( y t − ) + + β ( y t − ) − = β ( τ ) + β (cid:16) β ( τ ) + β f t − ( β τ ) + β ( y t − ) + + β ( y t − ) − (cid:17) + β ( y t − ) + + β ( y t − ) − = β ( τ )1 − β + β ∞ (cid:88) j =1 β j − ( y t − j ) + + β ∞ (cid:88) j =1 β j − ( y t − j ) − , (49)where the last line comes from | β | <

1. If β (cid:54) = 0, (49) reveals explicitly the nonlinearpattern of parameters in this CAViaR model. From this explicit form, we can further getthe gradient and the Hessian matrix of the CAViaR model (48) to emphasize the roles ofthe parameters. A.1 ∇ f t The gradient of f t ( β τ ) at a conditional quantile index τ ∈ (0 ,

1) of interest can be derivedas follows: ∇ f t ( β τ ) =  ∂f t ( β τ ) ∂β ∂f t ( β τ ) ∂β ∂f t ( β τ ) ∂β ∂f t ( β τ ) ∂β  =  β ∂f t − ( β τ ) ∂β f t − ( β τ ) + β ∂f t − ( β τ ) ∂β ( y t − ) + + β ∂f t − ( β τ ) ∂β ( y t − ) − + β ∂f t − ( β τ ) ∂β  =  β (cid:16) β ∂f t − ( β τ ) ∂β (cid:17) f t − ( β τ ) + β (cid:16) f t − ( β τ ) + β ∂f t − ( β τ ) ∂β (cid:17) ( y t − ) + + β (cid:16) ( y t − ) + + β ∂f t − ( β τ ) ∂β (cid:17) ( y t − ) − + β (cid:16) ( y t − ) − + β ∂f t − ( β τ ) ∂β (cid:17)  =  − β ∞ (cid:88) i =1 β i − f t − i ( β τ ) ∞ (cid:88) i =1 β i − ( y t − i ) + ∞ (cid:88) i =1 β i − i ( y t − i ) −  . (50)By knowing (49), we substitute f t − i ( β τ ) = β ( τ )1 − β + β ∞ (cid:88) j =1 β j − ( y t − i − j ) + + β ∞ (cid:88) j =1 β j − ( y t − i − j ) − , ∇ f t ( β τ ) in (50) and get ∇ f t ( β τ ) =  − β ∞ (cid:88) i =1 β i −  β ( τ )1 − β + β ∞ (cid:88) j =1 β j − ( y t − i − j ) + + β ∞ (cid:88) j =1 β j − ( y t − i − j ) −  ∞ (cid:88) i =1 β i − ( y t − i ) + ∞ (cid:88) i =1 β i − i ( y t − i ) −  =  − β β ( τ )(1 − β ) + β ∞ (cid:88) h =2 ( h − β h − ( y t − h ) + + β ∞ (cid:88) h =2 ( h − β h − ( y t − h ) −∞ (cid:88) i =1 β i − ( y t − i ) + ∞ (cid:88) i =1 β i − ( y t − i ) −  . (51)Now we can see the role of the parameters β τ explicitly. β τ shows up in all the elements ofthe gradient in a nonlinear form which makes it doubtless that the Hessian matrix does notfade out with β τ either. A.2 Hessian matrix

The second partial derivatives of f t ( β τ ) exist as ∇ f t ( β τ ) does, which can be seen from thederivation of the Hessian matrix H ( β τ ) of f t ( β τ ) as follows: H ( β τ ) =  ∂ f t ( β τ ) ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β ∂β ∂ f t ( β τ ) ∂β  =  − β ) − ∞ (cid:88) i =1 β i − ∂f t − i ( β τ ) ∂β ∞ (cid:88) i =2 ( i − β i − f t − i ( β τ ) + ∞ (cid:88) i =1 β i − ∂f t − i ( β τ ) ∂β ∞ (cid:88) i =1 β i − ∂f t − i ( β τ ) ∂β ∞ (cid:88) i =1 β i − ∂f t − i ( β τ ) ∂β ∞ (cid:88) i =2 ( i − β i − ( y t − i ) + ∞ (cid:88) i =2 ( i − β i − ( y t − i ) −  . (52)Considering the rewritten form of, the gradient of, and the Hessian matrix of this CAViaRmodel, it might raise a caution of estimating those variables by using estimated parametersbecause the persistent appearance of the parameters can give a slow convergence rate. Thatis how in essence the nonlinearity of parameters in CAViaR models diﬀerentiates CAViaRfrom linear quantile regressive models. 26 ppendix B How to simulate CAViaR data generatingprocesses Before estimating CAViaR models, we would like to provide a general way to simulate atime series { y t } of all conditional quantiles following a CAViaR speciﬁcation. To generatesuch a CAViaR data generating process (DGP), it is required to get the information onthe parameter speciﬁcation for every possible quantile so that the conditional distributionof { y t } at each time can be constructed no matter which quantile is realized. Indeed,when studying a data set, we might be interested in the 1%-th, 5%-th, 50%-th or 95%-thconditional quantiles. For instance in the climate change literature, extreme positive eventsare also of interest.This requirement also applies when generating QAR DGPs. However, simulating CAViaRmodels is more tedious than QAR simulations because the past conditional distributions alsoneed to be stored over time as they serve for the CAViaR DGP simulation through the modelVaR autoregressive terms each time. Let us illustrate the simulation process through anexample. First, we need to specify a CAViaR DGP at all quantiles for instance of (4) asfollows: y t = f t ( β u t ) = β ( u t ) + q (cid:88) i =1 β i ( u t ) f t − i ( β u t ) + r (cid:88) j =1 β q + j ( u t ) y t − j , where β (cid:48) u t := [ β ( u t ) , β ( u t ) , . . . , β p ( u t )] with p = q + r , and { u t } is i.i.d. in the stan-dard uniform distribution (denoted as U (0 , f t ( β u t ) is monotonically increasing in u t so that the τ -th quantile( τ ∈ (0 , y t conditional on F t − can be expressed as f t ( β τ ). The additional step beforesimulating { y t } Tt =1 is to specify the initial conditional distributions and the initial observa-tions, i.e., { f − i ( β τ ) , τ ∈ (0 , , i = 1 , . . . , q } and { y − j , j = 1 , . . . , r } . For example, we cantake f − i ( β τ ) = F − N (0 , ( τ ) for any τ ∈ (0 , , i = 1 , . . . , q and y − j = 0 for j = 1 , . . . , r ,where F − N (0 , ( τ ) is denoted as the inverse function of the standard normal distribution.With the above set-up, we can start the simulation by following the steps below.Step 1: Simulate a sequence of { u t } Tt =1 independently and identically distributed (i.i.d.) in U (0 , u t indicates that y t is realized as its conditional u t -th quantile.Step 2: At time t = 1, y t is realized as its u t -th quantile which is equal to f t ( β u t ) = β ( u t ) + q (cid:88) i =1 β i ( u t ) f t − i ( β u t ) + r (cid:88) j =1 β q + j ( u t ) y t − j . Step 3: Store { f t ( β u t + k ) } Tk =1 by f t ( β u t + k ) = β ( u t + k ) + q (cid:88) i =1 β i ( u t + k ) f t − i ( β u t + k ) + r (cid:88) j =1 β q + j ( u t + k ) y t − j . This step serves for generating { y t + k } Tk =1 later. For instance, y t + k = f t + k ( β u t + k ) isgenerated via the information on f t + k − i ( β u t + k ) , i = 1 , . . . , q . Iteratively, it requiresthe conditional u t + k -th quantiles of { y t + k − i } t + k − i =1 to be stored for generating y t + k .Step 4: Repeat Step 2 and 3 for t = 2 , , . . . , T until we get { y t } Tt =1 .27tep 5: In order to leave out the inﬂuence of the given initial values in this simulation, we haveto delete the observations in the burn-in period. We delete the ﬁrst 200 observationsand keep the rest { y t } Tt =201 as a suitable sample for studying the DGP (4).The above simulation procedure can be easily adapted to other CAViaR DGPs of whichmodel equations of f t ( β τ ) can be substituted into Step 2 with observed values of any involvedpredetermined variables. Appendix C Proofs

C.1 Proof of Theorem 2

Proof.

First, since expectation is a linear function, we can rewrite E y t , (cid:98) β (cid:104) (cid:98) h t (0 |F t − ) (cid:12)(cid:12)(cid:12) F t − (cid:105) as fol-lows: E y t , (cid:98) β (cid:104) (cid:98) h t (0 |F t − ) (cid:12)(cid:12)(cid:12) F t − (cid:105) = n − n (cid:88) i =1 E y t , (cid:98) β  (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , y t (cid:54) = f t ( (cid:98) β )  = n − n (cid:88) i =1 E y t , (cid:98) β  (cid:110) < y t − f t ( (cid:98) β ) ≤ ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  + n − n (cid:88) i =1 E y t , (cid:98) β  (cid:110) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) < y t − f t ( (cid:98) β ) < (cid:111) −∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  . (53)This equality holds when n goes to inﬁnity by applying the dominated convergence theoremas we regard the least ( p +1) absolute residuals in {| y t − f t ( (cid:98) β ) |} Tt =1 as zeros. Denote (cid:98) (cid:15) t := y t − f t ( (cid:98) β ). We rank {| (cid:98) (cid:15) t |} Tt =1 from the smallest to largest into {| (cid:98) (cid:15) | (1) , · , | (cid:98) (cid:15) | ( T ) } . In fact, iterationsof a simplex-based direct search method like the Nelder–Mead method for optimizing ( p + 1)parameters terminates at the vertices of a simplex in the parameter space (Lagarias et al.,1998). That is to say, the iterations in optimizing the τ -th quantile regression objectivefunction terminate with ( p + 1) elements of { ( τ − { y t − f t ( β ) < } )( y t − f t ( β )) } solved tobe zeros. Therefore, we set (cid:98) h t (0 |F t − ) at (cid:98) (cid:15) | (1) , . . . , | (cid:98) (cid:15) | ( p +1) . And | (cid:98) h t (0 |F t − ) | ≤ | (cid:98) (cid:15) | ( p +2) < ∞ , (54)where | (cid:98) (cid:15) | ( p +2) (cid:54) = 0 for a well-deﬁned convex function minimization.Since { b i − (cid:98) β } ni =1 is i.i.d in N ( , V d ) with restriction to ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:54) = 0 , we canget that for each t ∈ { , . . . , T } ,  E y t , (cid:98) β  (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , y t (cid:54) = f t ( (cid:98) β ) , y t (cid:54) = f t ( (cid:98) β )  ni =1

28s a sequence of independent random variables with ﬁnite second moments by the assumptionof (cid:107)∇ (cid:48) f t ( (cid:98) β ) (cid:107) ≤ F < ∞ (see Assumption AN1(a) of Engle and Manganelli (2004) ). Thenwe can use Kolmogorov’s strong Law of Large Number(see e.g. White, 2014, Corollary 3.9)and get that E y t , (cid:98) β (cid:104) (cid:98) h t (0 |F t − ) (cid:12)(cid:12)(cid:12) F t − (cid:105) a.s. −→ E y t , (cid:98) β , b i (cid:20) { y t ≤ f t ( (cid:98) β )+ ∇ (cid:48) f t ( (cid:98) β ) ( b i − (cid:98) β ) } − { y t ≤ f t ( (cid:98) β ) } ∇ (cid:48) f t ( (cid:98) β ) ( b i − (cid:98) β ) (cid:12)(cid:12)(cid:12)(cid:12) F t − , y t (cid:54) = f t ( (cid:98) β ) (cid:21) , (55)as n → ∞ conditionally on F t − . And we can further get that E y t , (cid:98) β , b i  (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , y t (cid:54) = f t ( (cid:98) β )  = E (cid:98) β , b i  E y t  (cid:110) < y t − f t ( (cid:98) β ) ≤ ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  + E (cid:98) β , b i  E y t  (cid:110) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) < y t − f t ( (cid:98) β ) < (cid:111) −∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  = E (cid:98) β , b i  F t (cid:16) f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:17) − F t (cid:16) f t ( (cid:98) β ) (cid:17) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) >  + E (cid:98) β , b i  F t (cid:16) f t ( (cid:98) β ) (cid:17) − F t (cid:16) f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:17) −∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) <  = E (cid:98) β , b i  F (cid:48) t (cid:16) f t ( (cid:98) β ) (cid:17) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) + O p (cid:18)(cid:16) b i − (cid:98) β (cid:17) (cid:48) ∇ f t ( (cid:98) β ) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:19) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t −  = E (cid:98) β , b i (cid:20) F (cid:48) t (cid:16) f t ( (cid:98) β ) (cid:17) + O p (cid:16) ∇ (cid:48) f t ( (cid:98) β )( b i − (cid:98) β ) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:21) = E (cid:98) β , b i (cid:20) F (cid:48) t (cid:16) f t ( (cid:98) β ) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:21) T →∞ −→ h t (0 |F t − ) , (56)where the last two lines are obtained by Taylor’s expansion for F t (cid:16) f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:17) at f t ( (cid:98) β ) and by knowing b i − (cid:98) β = o p (1) and lim T →∞ (cid:98) β = β o with F (cid:48) t ( f t ( · )) being a contin-uous function (see AN1 and AN2 of (Engle and Manganelli, 2004)) respectively.Therefore, we have E y t (cid:104) (cid:98) h t (0 |F t − ) (cid:12)(cid:12)(cid:12) F t − ) (cid:105) − h t (0 |F t − ) = o p (1) and conclude this proof.29 .2 Proof of Corollary 3 Proof.

From Theorem 2, we can obtain that E (cid:34) T T (cid:88) t =1 (cid:16) (cid:98) h t (0 |F t − ) − h t (0 |F t − ) (cid:17)(cid:35) = o p (1) (57)because E (cid:104) (cid:98) h t (0 |F t − ) − h t (0 |F t − ) (cid:105) = o p (1) when n → ∞ . And1 T T (cid:88) t =1 E (cid:20)(cid:16) (cid:98) h t (0 |F t − ) − h t (0 |F t − ) (cid:17) (cid:21) = 1 T T (cid:88) t =1 E (cid:104) (cid:98) h t (0 |F t − ) − h t (0 |F t − ) (cid:105) + o p (1) . (58)Denote (cid:98) h t,i := {

Proof.

From the condition (39), we can know that ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) i.i.d. ∼ N ( , T ∇ (cid:48) f t ( (cid:98) β ) V d ∇ f t ( (cid:98) β )) , i = 1 , . . . , n, and (cid:12)(cid:12)(cid:12)(cid:12) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) = 0 . Denote the probability distribution function of ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) as F ∇ ( · ), δ ∇ := (cid:113) ∇ (cid:48) f t ( (cid:98) β ) V d ∇ f t ( (cid:98) β ) T and (cid:98) (cid:15) t := y t − f t ( (cid:98) β ).From (55) in the proof of Theorem 2, we know that (cid:98) h t (0 |F t − ) n →∞ −→ E b i  (cid:110) y t ≤ f t ( (cid:98) β ) + ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17)(cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) ∇ (cid:48) f t ( (cid:98) β ) (cid:16) b i − (cid:98) β (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t − , y t (cid:54) = f t ( (cid:98) β )  = (cid:90) R \ [ −| (cid:98) (cid:15) t | , | (cid:98) (cid:15) t | ) (cid:110) y t ≤ f t ( (cid:98) β ) + x (cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) x d F v ( x ) . (62)We can further rewrite (62) based on two cases in (cid:98) (cid:15) t , namely (cid:98) (cid:15) t > (cid:98) (cid:15) t < (cid:98) h t (0 |F t − )is set to be zero in ARB when (cid:98) (cid:15) t = 0.When (cid:98) (cid:15) t >

0, we get (cid:98) h t (0 |F t − ) = (cid:90) R \ [ −| (cid:98) (cid:15) t | , | (cid:98) (cid:15) t | ) (cid:110) y t ≤ f t ( (cid:98) β ) + x (cid:111) − (cid:110) y t ≤ f t ( (cid:98) β ) (cid:111) x d F v ( x )= (cid:90) R \ [ −| (cid:98) (cid:15) t | , | (cid:98) (cid:15) t | ) { (cid:98) (cid:15) t ≤ x } x d F v ( x )= (cid:90) ∞ (cid:98) (cid:15) t x δ ∇ t √ π e − x δ ∇ d x. (63)Substitute u := x δ ∇ t into (63) and get (cid:98) h t (0 |F t − ) = 1 δ ∇ t √ π (cid:90) ∞ (cid:98) (cid:15) t δ ∇ t e − u u d u = 12 δ ∇ t √ π E (cid:18) (cid:98) (cid:15) t δ ∇ t (cid:19) , (64)where E ( s ) := (cid:82) ∞ s x − e − x d x is a special integral known as the exponential integral or theincomplete gamma function Γ(0 , s ). 31nalogously, when (cid:98) (cid:15) t <

0, we can also get (cid:98) h t (0 |F t − ) = 12 δ ∇ t √ π E (cid:18) (cid:98) (cid:15) t δ ∇ t (cid:19) . (65)Therefore, we conclude this proof. C.4 Proof of Theorem 5

Proof.

Denote that ¯ D T := T − T (cid:88) t =1 h t (0 |F t − ) ∇ (cid:48) f t ( (cid:98) β τ ) ∇ f t ( (cid:98) β τ ) . (66)Note that (cid:98) D arbT − D T = (cid:98) D arbT − ¯ D T + ¯ D T − D T . (67)It is straightforward to get that (cid:98) D arbT − ¯ D T = o p (1) , (68)since we know that 1 T T (cid:88) t =1 (cid:98) h t (0 |F t − ) − T T (cid:88) t =1 h t (0 |F t − ) = o p (1)from T T (cid:88) t =1 (cid:98) h t (0 |F t − ) m.s. −→ T T (cid:88) t =1 h t (0 |F t − ) given in Corollary 3 with {∇ f t ( β ) ∇ (cid:48) f t ( β ) } beinguniformly bounded in R p +1 by Assumption AN1 of Engle and Manganelli (2004). And¯ D T − D T = o p (1) , (69)since that (cid:98) A T − A T p −→ { h t ( ·|F t − ) } is uniformly bounded by a ﬁnite constant according to AssumptionAN2 of Engle and Manganelli (2004).Therefore, we have that (cid:98) D arbT − D T = o p (1) and conclude this proof.32 ppendix D Extra ﬁgures Figure 3: Time series plots of CAViaR DGP samples for illustration33igure 4: Time series plots of CAViaR DGP samples for illustration34 ppendix E Extra test results • Simulate 1000 samples from the following DGP: y t = f t ( β R u t ) = β R ( u t ) + β R ( u t ) f t − ( β R τ ) + β R ( u t ) | y t − | = β R ( u t ) + β R ( u t ) f t − ( β R u t ) + β R ( u t ) ( y t − ) + + β R ( u t ) ( y t − ) − , (70)where { u t } i.i.d. ∼ U (0 ,

1) and the underlying parameters change over u t as follows:  β R ( u t ) =  F − N (0 , , < u t ≤ . F − N (0 , , . < u t ≤ . F − N (0 , , . < u t < ,β R ( u t ) = 0 . , < u t < ,β R ( u t ) = 0 . , < u t < , (71)where F − N (0 , ( · ) is the inverse standard normal probability distribution function. Con-ditional 5%-th, 30%-th, 50%-th quantiles are estimated for each of the total 1000 sim-ulated samples of sample size T by regressing the sample onto the full model (25).The results of the Wald test using the adaptive random bandwidth method and thekernel method (22) are listed in Table 8 in which each estimated size is obtained bythe percentage rejection rate among the 1000 samples of sample size T .Table 8: The size performances of the Wald test on the restricted model (26) to (25)( β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , R = [0 , , , − quantile index τ & sample size T methods size: α = 0 . α = 0 . α = 0 . α = 0 . τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.016 0.054 0.091 0.1792 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.024 0.08 0.134 0.2282 times updating V d ) (cid:98) D kerT τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.01 0.045 0.085 0.1682 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.015 0.049 0.085 0.1922 times updating V d ) (cid:98) D kerT τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.014 0.056 0.087 0.182 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.007 0.041 0.076 0.1572 times updating V d ) (cid:98) D kerT • Simulate 1000 samples of the DGP { y t } speciﬁed as the model (26) with the underlyingparameters are given as β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , where { u t } i.i.d. ∼ U (0 ,

1) and F − N (0 , ( · ) is the inverse standard normal probability distribution function. Conditional35%-th, 30%-th, 50%-th quantiles are estimated for each of the total 1000 simulatedsamples of sample size T by regressing the sample onto the full model (25). Theresults of the Wald test using the adaptive random bandwidth method and the kernelmethod (22) are listed in Table 9 in which each estimated size is obtained by thepercentage rejection rate among the 1000 samples of sample size T .Table 9: The size performances of the Wald test on the restricted model (26) to (25)( β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , R = [0 , , , − quantile index τ & sample size T methods size: α = 0 . α = 0 . α = 0 . α = 0 . τ = 0 . , T = 4000 (cid:98) D arbT ( n = 10 , 0.018 0.06 0.101 0.1872 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.019 0.054 0.107 0.192 times updating V d ) (cid:98) D kerT τ = 0 . , T = 4000 (cid:98) D arbT ( n = 10 , 0.01 0.058 0.103 0.1872 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.021 0.061 0.11 0.1842 times updating V d ) (cid:98) D kerT τ = 0 . , T = 4000 (cid:98) D arbT ( n = 10 , 0.014 0.062 0.126 0.2212 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.025 0.064 0.1 0.1942 times updating V d ) (cid:98) D kerT • Simulate 1000 samples of the DGP { y t } speciﬁed as the model (34) with the underlyingparameters are given as β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , where { u t } i.i.d. ∼ U (0 ,

1) and F − N (0 , ( · ) is the inverse standard normal probability distribution function. Conditional5%-th, 30%-th, 50%-th quantiles are estimated for each of the total 1000 simulatedsamples of sample size T by regressing the sample onto the full model (25). Theresults of the Wald test using the adaptive random bandwidth method and the kernelmethod (22) are listed in Table 10 in which each estimated size is obtained by thepercentage rejection rate among the 1000 samples of sample size T .36able 10: The size performances of the Wald test on the restricted model (34) to (25)( β R u t = [ F − N (0 , ( u t ) , . , . (cid:48) , R = [0 , , , − quantile index τ & sample size T methods size: α = 0 . α = 0 . α = 0 . α = 0 . τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.032 0.069 0.096 0.172 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.052 0.093 0.127 0.192 times updating V d ) (cid:98) D kerT τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.032 0.071 0.121 0.2072 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.031 0.063 0.123 0.2042 times updating V d ) (cid:98) D kerT τ = 0 . , T = 5000 (cid:98) D arbT ( n = 10 , 0.021 0.055 0.095 0.1882 times updating V d ) (cid:98) D kerT τ = 0 . , T = 2000 (cid:98) D arbT ( n = 10 , 0.034 0.069 0.118 0.2082 times updating V d ) (cid:98) D kerT0.088 0.158 0.212 0.311