[PDF] Incorporating Financial Big Data in Small Portfolio Risk Analysis: Market Risk Management Approach

Abstract

When applying Value at Risk (VaR) procedures to specific positions or portfolios, we often focus on developing procedures only for the specific assets in the portfolio. However, since this small portfolio risk analysis ignores information from assets outside the target portfolio, there may be significant information loss. In this paper, we develop a dynamic process to incorporate the ignored information. We also study how to overcome the curse of dimensionality and discuss where and when benefits occur from a large number of assets, which is called the blessing of dimensionality. We find empirical support for the proposed method.

Full PDF

IIncorporating Financial Big Data in Small PortfolioRisk Analysis: Market Risk Management Approach

Donggyu Kim ∗ and Seunghyeon Yu College of Business, Korea Advanced Institute of Science and Technology, Seoul, Korea

February 26, 2021

Abstract

When applying Value at Risk (VaR) procedures to speciﬁc positions or portfolios,we often focus on developing procedures only for the speciﬁc assets in the portfolio.However, since this small portfolio risk analysis ignores information from assets outsidethe target portfolio, there may be signiﬁcant information loss. In this paper, we developa dynamic process to incorporate the ignored information. We also study how toovercome the curse of dimensionality and discuss where and when beneﬁts occur from alarge number of assets, which is called the blessing of dimensionality. We ﬁnd empiricalsupport for the proposed method.

JEL classiﬁcation:

C13, C14, C32, C55, C58, G32.

Key words and phrases:

Value at Risk, blessing of dimensionality, curse of dimen-sionality, high-dimensionality, principal component analysis, multivariate GARCH, factormodel. ∗ corresponding author. Tel: +82 2 958 3448.E-mail addresses: [email protected] (D. Kim), [email protected] (S. Yu) a r X i v : . [ q -f i n . R M ] F e b Introduction

Risk management has become more important than ever since the 2008 ﬁnancial crisis.Several risk measurement methods, such as Value at Risk (VaR), expected shortfall (ES),entropic Value at Risk (EVaR), and superhedging price, have been developed (Acerbi andTasche, 2002; Ahmadi-Javid, 2012; Bensaid et al., 1992; Jorion, 2000). VaR, in particular, iswidely used to measure and control the level of risk by measuring the amount of loss for in-vestments during a given period such as a day under normal market conditions (Jorion, 2000).When measuring VaR, we need to consider stylized market features such as the asymmetricalheavy-tailed log return distribution and the time-series dynamics (Christoﬀersen and Lan-glois, 2013; Cont, 2001; Longin and Solnik, 2001). To explain asymmetric and heavy-tailedlog returns, researchers imposed more realistic distribution assumptions on the log returns.For example, Huisman et al. (1998) and Wu and Shieh (2007) considered student-t distri-bution in the VaR model; Zoia et al. (2018) employed the Gram-Charlier expansion, whichis the polynomially modiﬁed Gaussian distribution; and Natarajan et al. (2008) developedasymmetry-robust VaR (ARVaR), which takes into account the asymmetric distribution ofreturns. On the other hand, in the stock market, we often observe dynamic structures. Forexample, large volatility tends to be followed by large volatility, and small volatility tends tobe followed by small volatility; this is called volatility clustering (Mandelbrot, 1963). Thisstylized feature inspires the building of dynamic risk models. Examples include the dynamiccopula model (Fantazzini, 2008), conditional autoregressive Value at Risk (CAViaR) (Engleand Manganelli, 2004), regime-switching VaR model (Billio and Pelizzon, 2000), and dynamicextreme value estimator (McNeil and Frey, 2000). Also, many researchers have attempted toapply dynamic volatility models to measure VaR based on the σ -based method (Brooks andPersand, 2003; Giot and Laurent, 2004; Hull and White, 1998; Jorion, 1996, 2000; Kuesteret al., 2006; Sadorsky, 2005). Speciﬁcally, to account for the market dynamics, they em-ployed dynamic volatility models such as portfolio univariate GARCH (Bollerslev, 1986),BEKK (Engle and Kroner, 1995), and constant conditional correlation (CCC) (Bollerslev,1990). 2ost of these dynamic risk models usually consider only the stock data in their targetportfolio (Fan and Gu, 2003; Hendricks, 1996; Patton et al., 2019), and the portfolio sizeis usually relatively small. They often ignore the stock data that do not belong to theportfolio. We call this small portfolio risk analysis. However, it is doubtful whether smallportfolio risk analysis is suﬃcient to capture portfolio risk dynamics. In fact, the factor modelindicates that the individual asset risk can be decomposed into systematic risk (commonrisk) and idiosyncratic risk (ﬁrm-speciﬁc risk) (Bali et al., 2005; Malkiel and Xu, 1997;Shiller, 1995). Through portfolio diversiﬁcation, we can nearly eliminate idiosyncratic risk,but systematic risk cannot be reduced. In light of this, portfolio risk dynamics may beprimarily governed by systematic risk dynamics. Thus, it is important to model systematicrisk. Systematic risk is known as common risk that aﬀects the whole stock market, so everystock in the market may contain systematic risk information. Thus, employing ﬁnancial bigdata can help to capture systematic risk dynamics. However, the multivariate VaR methodsdesigned for small portfolio risk analysis are not feasible for analyzing ﬁnancial big datadue to the so-called curse of dimensionality . For instance, when the BEKK model (Engleand Kroner, 1995) is conducted on p -dimensional log return data, the number of parametersincreases with the p order. As the dimension p increases, the parameter estimation becomescomputationally demanding due to the exploding of computation time, which is the NPhard problem. Furthermore, even if it is possible to estimate the parameters, they are notconsistent estimators (Bickel and Levina, 2008; Engle et al., 2017; Pakel et al., 2017). Dueto these practical and theoretical problems, even though the asset return data are easilyaccessible, risk managers cannot employ ﬁnancial big data. This fact urges us to developﬁnancial data analysis procedures that incorporate ﬁnancial big data.In this paper, we develop a VaR estimation procedure that incorporates ﬁnancial bigdata into portfolio risk analysis and study how to overcome the curse of dimensionality.Speciﬁcally, we employ the approximate latent factor model (Ait-Sahalia and Xiu, 2017;Fan et al., 2013, 2019; Li et al., 2018) as the baseline model. Then the log return canbe decomposed into the factor and idiosyncratic parts. For the factor part, we develop a3ynamic model to account for systematic risk. For example, the conditional expected factorvolatility given the current available information is a famous GARCH form and therefore isan autoregressive function of historical squared factor log returns. To evaluate the proposeddynamic factor model, we estimate the latent factor and idiosyncratic component using theprincipal orthogonal complement thresholding (POET) procedure (Fan et al., 2013). To dothis, we further assume a sparse structure on the idiosyncratic volatility matrix, which isthe immediate result of the approximate factor model. With the latent factor estimator, wepropose a maximum quasi-likelihood estimation method for the GARCH parameter. Theestimated GARCH parameters are employed to predict one-step ahead portfolio volatility.Finally, the one-step ahead VaR is predicted by the parametric and non-parametric σ -basedapproaches (Brooks and Persand, 2003; Giot and Laurent, 2004; Hull and White, 1998).We study their asymptotic properties and discuss when and where the gain is coming fromincorporating ﬁnancial big data, which is called the blessing of dimensionality (Donoho, 2000;Fan et al., 2013; Li et al., 2018). These theoretical results are examined by the simulationand empirical studies, and we compare the proposed method with various competing smallportfolio risk analysis models.The rest of the paper is organized as follows. In Section 2, we brieﬂy introduce the conceptof Value at Risk and develop a dynamic risk model. In Section 3, we propose a maximumquasi-likelihood estimation method and study its asymptotic behavior. We also discuss howthe proposed method can enjoy the blessing of dimensionality by employing ﬁnancial bigdata. Section 4 shows how to predict the large volatility matrix and applies the proposedmethod to measure VaR. We also discuss how to solve the curse of dimensionality. In Section5, we conduct Monte Carlo simulation studies to check the ﬁnite sample performance of theproposed methods, and in Section 6 we apply the proposed VaR measure to the empiricaldata. Finally, in Section 7 we provide concluding remarks. All technical proofs are given inthe online appendix. 4 A model setup σ -based VaR method One of the most popular risk measures is VaR, which estimates how much a portfolio ofassets might lose under normal market conditions. Speciﬁcally, VaR at level α is deﬁned asthe α -percentile of a portfolio log return distribution as follows:VaR α,t = − inf (cid:8) x : P { r t ≤ x } > α (cid:9) , (2.1)where P denotes the distribution of the portfolio log returns r t . When the portfolio log returndistribution follows a location-scale family such as normal distribution and t-distribution,we can obtain VaR from the portfolio mean µ t and volatility σ t as follows:VaR α,t = − µ t − c α σ t , (2.2)where c α is an α -quantile value of the distributions of the standardized r t . This method iscalled σ -based approach (Jorion, 2000). The performance of the σ -based approach dependson three components: the mean µ t , volatility σ t , and quantile c α . The quantile c α can bedetermined by the parametric and non-parametric methods. The parametric method involvesthe assumption of the parametric distribution for the standardized log return ( r t − µ t ) /σ t such as normal distribution or student-t distribution, and c α is determined by the α -quantileof each distributions. The non-parametric method uses a sample quantile of the historicalstandardized portfolio log returns ( r t − µ t ) /σ t . Detailed implementations are described inSection 4.2. On the other hand, several empirical studies indicate that the portfolio mean µ t does not have strong time series patterns (Cajueiro and Tabak, 2004; Fama, 1998; Malkiel,2003; Narayan and Smyth, 2004). Moreover, the mean µ t gives a relatively small eﬀecton VaR with extreme α , compared to c α σ t . Meanwhile, in the stock market, we observevolatility time series structures such as the volatility clustering (Mandelbrot, 1963). Fromthis point of view, it is crucial to determine the dynamic structure of the volatility σ t , rather5han µ t . Thus, in this paper, we focus on how to incorporate the dynamic volatility structurein VaR. When analyzing portfolio volatility, we usually investigate only the stock log return datathat belong to the portfolio. For example, let y s,t be the s -dimensional log return vectorwhose stocks belong to the portfolio. We also denote the co-volatility matrix of y s,t by Σ s,t and the s -dimensional portfolio weight vector by w s . Then the portfolio volatility can beobtained from Σ s,t as follows: σ t = (cid:113) w (cid:62) s Σ s,t w s . To account for the market dynamics of the portfolio return, several dynamic models havebeen introduced, such as the BEKK (Engle and Kroner, 1995), CCC (Bollerslev, 1990), andportfolio univariate GARCH (Bollerslev, 1986). See also Engle (2002), Kawakatsu (2006),and Van der Weide (2002). To be speciﬁc, the BEKK model with the mean vector µ s,t hasthe following structure: y s,t = µ s,t + Σ s,t ε s,t , Σ s,t = CC (cid:62) + A (cid:62) ( y s,t − − µ s,t − )( y s,t − − µ s,t − ) (cid:62) A + B (cid:62) Σ s,t − B ,ε s,it ∼ i.i.d. F (0 , , where C is an s × s lower triangular matrix, A and B are s × s matrices, and F (0 ,

1) issome arbitrary distribution with mean zero and variance one. As the above BEKK model,the dynamic volatility models have the autoregressive form of the historical squared returns.Empirical study shows that the volatility is heterogeneous, and these models can explain themarket dynamics (Bollerslev, 1986, 1990; Engle, 2002; Engle and Kroner, 1995; Kawakatsu,2006; Van der Weide, 2002). Moreover, the dynamic volatility model helps to improve6easuring VaR (Engle and Manganelli, 2004; Giot and Laurent, 2004; Kuester et al., 2006).That is, the performance of measuring VaR can be improved by identifying the marketdynamics. In ﬁnancial markets, we often observe that much of the market dynamic structureis coming from the systematic risk, which is related to the whole stock market, so thesystematic risk information is contained in the whole stock. From this point of view, evenif we consider only the small number of assets in the portfolio, to account for the marketdynamics, we need to investigate the whole stock market and extract the systematic riskinformation from the stock returns. One of the naive extensions is to use the multivariatevolatility model for the whole stock market such as the BEKK and CCC models. However,when applying these volatility models to incorporating a large number of assets, we face thecurse of dimensionality (Bickel and Levina, 2008; Engle et al., 2008, 2017). Speciﬁcally, let p be the total number of assets, and then the number of parameters quadratically increaseswith p . Hence, with a large p , the estimated model is statistically inconsistent. Furthermore,in practice, the parameter optimization takes exponential computation time, which is knownas the NP hard problem. Thus, it is not feasible to apply the usual multivariate volatilitymodel to a large number of assets. In this paper, we discuss how to handle the curse ofdimensionality issue and simultaneously how to account for the market dynamics. Consider a market consisting of p stocks. Let y t denote the R p -valued random vector ofthe assets’ log returns at time t . Then the portfolio is characterized by a vector of assetweights w ∈ R p , for example, r t = w (cid:62) y t . In ﬁnancial analysis, we often employ the factormodel (Ait-Sahalia and Xiu, 2017; Bai, 2003; Carhart, 1997; Fan et al., 2013, 2019; Famaand French, 1992, 1993, 2015) as follows: y t = µ + Vf t + u t , (2.3)7here µ is a constant mean vector, f t is an r -dimensional factor, V is a p × r factor loadingmatrix, and u t is an idiosyncratic component. We assume that the factor f t and idiosyncraticcomponent u t are independent, and without loss of generality, we assume that the mean of f t and u t are zero. Usually the number of market factors is much smaller than the numberof assets, so we assume that r is ﬁnite. Moreover, in this paper, we consider the latent factormodel (Ait-Sahalia and Xiu, 2017; Fan et al., 2013, 2019; Li et al., 2018), so f t and u t arenot observable. In this section, we propose a dynamic factor model under the latent factorstructure in (2.3).The factor model implies that the risk of assets stems from the factor and idiosyncraticcomponents, which are called systematic and idiosyncratic risks, respectively. Idiosyncraticrisk is related to the ﬁrm-speciﬁc risk, so it does not aﬀect the whole market. Moreover,it can be mitigated by the portfolio diversiﬁcation. In contrast, systematic risk arises fromcommon market factor such as interest rate, inﬂation, oil price, and so on. Since systematicrisk aﬀects the whole market, we cannot mitigate this risk. From this point of view, we imposea dynamic structure on the factor to account for the market risk. For the idiosyncratic risk,we employ some martingale assumption, for example, the idiosyncratic volatility is constantover time. Then the conditional expected co-volatility matrix of y t given the current availableinformation F t is expressed as follows: E (cid:2) ( y t +1 − µ )( y t +1 − µ ) (cid:62) (cid:12)(cid:12) F t (cid:3) = Σ t +1 = VΣ f,t +1 V (cid:62) + Σ u , (2.4)where Σ f,t +1 = E (cid:2) f t +1 f (cid:62) t +1 (cid:12)(cid:12) F t (cid:3) , and Σ u represents the idiosyncratic volatility matrix. Underthe framework (2.4), the market dynamics can be explained by the factor volatility dynamics.The latent factor model has an identiﬁcation problem in ( V , f t ). For example, the pair( V , f t ) is not distinguishable from ( VH (cid:62) , Hf t ) for any orthonormal matrix H . To uniquelydeﬁne the latent factor model, we often impose the following identiﬁability condition (Baiand Li, 2012; Bai and Ng, 2013; Fan et al., 2013; Kim and Fan, 2019; Li et al., 2018): V (cid:62) V = p I r and Σ f,t is diagonal . (2.5)8he identiﬁcation condition (2.5) implies that the scaled factor loading matrix p − / V andelements of p Σ f,t are the eigenmatrix and eigenvalues of the conditional variance of the factorpart. Thus, the market dynamics are explained by the dynamic structure of the eigenvalues,whereas its factor loading matrix is constant over time.To account for the factor dynamics, we employ the famous GARCH model (Bauwenset al., 2006; Engle and Kroner, 1995; Engle, 2002, 2016; Van der Weide, 2002). The condi-tional expected volatility of the factor is modeled by the following GARCH structure: f t = (cid:16)(cid:112) h t ( θ ) (cid:15) t , . . . , (cid:112) h rt ( θ ) (cid:15) rt (cid:17) , h t ( θ ) = ω + Af t − + Bh t − ( θ ) , (2.6)where (cid:15) it ’s are i.i.d. random variables with mean zero and unit variance; h t ( θ ) is an r -dimensional vector of the conditional expected factor volatility Σ f,t , that is, h t = (cid:0) h t ( θ ) , . . . ,h rt ( θ ) (cid:1) = diag( Σ f,t ) where diag( X ) is a vector whose elements are diagonal entries of X ; f t is an element-wise squared vector of the factor log return f t ; and θ = ( ω , vec( A ) , vec( B ))is the GARCH parameters. The GARCH model in (2.6) indicates that the conditionalexpected volatility of the factor is the autoregressive form of the historical squared factorlog returns. Thus, we expect that this model can capture the stylized market features suchas the volatility clustering and heavy-tail. The empirical study in Section 6 also supportsthis. For the high-frequency ﬁnancial data, Kim and Fan (2019) recently introduced thefactor GARCH-Itˆo model to account for the market dynamics and they showed that thedynamic factor model can capture the market dynamics well. Thus, the proposed model isthe discrete-time version of the factor GARCH-Itˆo model.The idiosyncratic component is coming from the ﬁrm-speciﬁc risk, so their risk is notstrongly connected. Empirical studies show that there are some local factors that aﬀect fewother idiosyncratic components (Ait-Sahalia and Xiu, 2017; Boivin and Ng, 2006; Kalninaand Tewou, 2015). In light of these, we allow a weak relationship between idiosyncraticcomponents so that the idiosyncratic volatility Σ u = ((Σ u,ij ) i,j =1 ,...,p ) satisﬁes the following9parse condition: max i ≤ p p (cid:88) j =1 | Σ u,ij | q (Σ u,ii Σ u,jj ) (1 − q ) / ≤ s p , (2.7)where q ∈ [0 ,

1) and the sparsity measure s p diverges slowly with the dimension p , forexample, log p . This sparsity condition is widely employed in the large convariance matrixinferences (Ait-Sahalia and Xiu, 2017; Bickel and Levina, 2008; Cai and Liu, 2011; Fan et al.,2013, 2018, 2019; Kim and Fan, 2019). When the idiosyncratic volatility satisﬁes the exactsparsity, that is, q = 0, the sparsity condition indicates that each asset has at most s p non-zero idiosyncratic correlations with other assets.Under the volatility structure (2.4) and (2.6), the VaR of portfolios in (2.2) can becalculated as follows: VaR α,t = − w (cid:62) µ − c α (cid:113) w (cid:62) ( VΣ f,t V (cid:62) + Σ u ) w . To evaluate the above VaR value, we need to estimate the unobserved factor components andthe idiosyncratic volatility. However, to incorporate the whole market assets information,we consider a large number of assets and consequently run into the curse of dimensionalityproblem. In the following section, we discuss how to overcome this issue and introduce anestimation procedure to incorporate the ﬁnancial big data for risk analysis.

First we deﬁne the notations. We denote (cid:107) · (cid:107) , (cid:107) · (cid:107) F , and (cid:107) · (cid:107) max by the matrix spectralnorm, Frobenius norm, and max norm, respectively. We use O p as a big-O in probability, λ k ( A ) as the k th largest eigenvalue of the square matrix A , and vec( A ) as the vectorizationof A . We also denote the true parameters by θ = ( ω , vec( A ) , vec( B )).10 .1 Latent factor estimation To estimate the model parameter θ = ( ω , vec( A ) , vec( B )), we ﬁrst need to estimate thelatent factor components f t and V . We recall that the conditional volatility matrix is Σ t +1 = VΣ f,t +1 V (cid:62) + Σ u , and the mean conditional volatility matrix is Σ = VΣ f V T + Σ u , where Σ f = T − (cid:80) Tt =1 Σ f,t . Under the identiﬁcation and sparsity conditions (2.4) and(2.7), the mean conditional volatility matrix has the low-rank plus sparse structure thatis widely used in the high-dimensional factor analysis (Ait-Sahalia and Xiu, 2017; Fan et al.,2013, 2018, 2019; Kim and Fan, 2019). With the low-rank plus sparse structure, Fan et al.(2013) introduced the POET estimation procedure to estimate the latent factor volatilityand idiosyncratic volatility matrices. To harness the POET procedure, we need a goodproxy of the mean conditional volatility matrix. We use the sample covariance matrix (cid:98) Σ = T − (cid:80) Tt =1 ( y t − y )( y t − y ) (cid:62) with sample mean y = T − (cid:80) Tt =1 y t , and under somemild condition, the martingale convergence theorem implies that the sample covariance ma-trix converges to Σ . Thus, we apply the POET procedure with the sample covariance matrixto estimate the latent factor components. Speciﬁcally, the eigenvalue decomposition admits (cid:98) Σ = p (cid:88) i =1 (cid:98) λ i (cid:98) q i (cid:98) q (cid:62) i , (3.1)where (cid:98) λ k and (cid:98) q k are the k th largest eigenvalues and eigenvectors of the sample covariancematrix (cid:98) Σ , respectively. Then the factor loading matrix estimator (cid:98) V is √ p ( (cid:98) q , . . . , (cid:98) q r ) andthe mean factor volatility matrix estimator (cid:98) Σ f is p − Diag (cid:0) ( (cid:98) λ , . . . , (cid:98) λ r ) (cid:1) , where Diag( x ) is adiagonal matrix whose diagonal entries are x . Note that (cid:98) V has a multiplier √ p so that (cid:98) V V (cid:62) V = p I r . Then the latent factors can be obtained by (cid:98) f t = 1 p (cid:98) V (cid:62) ( y t − y ) . To derive the convergence rate of this latent factor component estimator, we need the fol-lowing technical assumptions.

Assumption 1. (a) E [ f it ] , E [ h it ( θ )] , E (cid:2) ( u it u jt − Σ u,ij ) (cid:3) , and E (cid:2)(cid:0) q (cid:62) u t / (cid:112) q (cid:62) Σ u q (cid:1) (cid:3) are bounded by someconstant C for all i, j and q s.t. (cid:107) q (cid:107) = 1 .(b) The minimum eigen-gap δ r = min k ≤ r (cid:12)(cid:12) λ k ( Σ f ) − λ k +1 ( Σ f ) (cid:12)(cid:12) satisﬁes δ r ≥ C a.s.Remark . Assumption 1(b) is the pervasive condition that is widely used in the low-rankmatrix inferences (Ait-Sahalia and Xiu, 2017; Fan et al., 2013, 2018, 2019; Kim and Fan,2019; Stock and Watson, 2002). The pervasive condition with the sparsity condition (2.7)helps to distinguish the latent factor from the idiosyncratic volatility. Additionally, sincethe common market factor aﬀects the whole asset, its proportion to the total variation issigniﬁcant. Mathematically, this implies that the corresponding eigenvalues have the p order,so the pervasive condition is not restrictive.The following theorem provides the convergence rates of the latent factor componentestimators (cid:98) V and (cid:98) f t . Theorem 3.1.

Under the model in Section 2.3, suppose Assumption 1. Then we have min O (cid:13)(cid:13) (cid:98) V − VO (cid:13)(cid:13) = O p (cid:18) pT + s p p (cid:19) , (3.2) (cid:13)(cid:13)(cid:98) f t − f t (cid:13)(cid:13) = O p (cid:18) √ T + (cid:114) s p p (cid:19) , (3.3) where O is a diagonal sign matrix which has value ± . emark . Eigenvectors have the sign problem; for example, − v i and v i are not distinguish-able, where v i is the i th column vector of V . So we put the sign matrix O in (3.2) to identifythe eigenmatrix. Remark . Theorem 3.1 shows that the convergence rate of (cid:98) f t is 1 / √ T + (cid:112) s p /p . Theﬁrst term 1 / √ T is the usual optimal convergence rate for estimating the mean conditionalvolatility matrix Σ . The second term (cid:112) s p /p is the cost to identify the latent factor.When the factor is not observable and the number of assets is ﬁnite, it is impossibleto estimate the latent factors. In contrast, Theorem 3.1 shows the blessing of ﬁnancial bigdata in the latent factor estimation. Speciﬁcally, every asset contains the common marketinformation. Thus, as the number of assets increases, the information for the latent factorsalso increases. Therefore, more stock sample exhibits a clearer latent signal, and we canestimate the latent factors consistently with the convergence rate (cid:112) s p /p . In this section, we propose a model parameter estimation procedure. Speciﬁcally, we adoptthe following quasi-maximum likelihood estimation (QMLE) method:min θ ∈ Θ T (cid:88) t =1 r (cid:88) i =1 (cid:0) log h it ( θ ) + h − it ( θ ) f it (cid:1) , (3.4)where Θ is a compact parameter space. Then the well-developed asymptotic theorems forthe QMLE provide the consistency (Bollerslev and Wooldridge, 1992; Comte and Lieberman,2003; Lee and Hansen, 1994). However, the factors are not observable, so to evaluate thequasi-likelihood function, we use the non-parametric factor estimator (cid:98) f t . For example, theconditional co-volatility h t ( θ ) is estimated by (cid:98) h t ( θ ) = ω + A (cid:98) f t − + B (cid:98) h t − ( θ ) , (3.5)13nd we use (cid:98) h ( θ ) = ( I r − A − B ) − ω as the initial value. Note that (cid:98) h ( θ ) is the unconditionalvolatility of the factor, and the eﬀect of the initial value is negligible (see Lemma A.2). Thenwe calculate the quasi-likelihood function with the conditional co-volatility estimator (cid:98) h t ( θ )and obtain the maximum quasi-likelihood estimator (cid:98) θ as follows: (cid:98) θ = arg min θ ∈ Θ T (cid:88) t =1 r (cid:88) i =1 (cid:16) log (cid:98) h it ( θ ) + (cid:98) h − it ( θ ) (cid:98) f it (cid:17) . (3.6)To investigate the asymptotic properties for the QMLE (cid:98) θ , we need the following technicalconditions. Assumption 2. (a) The parameter space Θ is a compact set such that every element θ ∈ Θ is positive; sup θ ∈ Θ E [ h it ( θ )] is bounded for all i = 1 , . . . , r and t = 1 , . . . , T ; the eigenvalues of B and A are positive and (cid:107) B (cid:107) < ; and θ is the interior point.(b) f t ’s are non-degenerate random variables. The below theorem illustrates the convergence rate of the QMLE estimator (cid:98) θ . Theorem 3.2.

Under the model in Section 2.3, suppose Assumptions 1 and 2. Then wehave (cid:13)(cid:13)(cid:98) θ − θ (cid:13)(cid:13) max = O p (cid:18) √ T + (cid:114) s p p (cid:19) . (3.7) Remark . Theorem 3.2 shows that the QMLE estimator (cid:98) θ has the convergence rate 1 / √ T + (cid:112) s p /p . The term (cid:112) s p /p originates from the latent factors estimation in (3.3), which is thecost to identify the latent factor. Thus, when the factors are observable, the convergencerate will be 1 / √ T .One of our main objectives is to predict the future volatility. With the QMLE estimator14 θ , we can estimate the conditional volatility as follows: (cid:98) Σ f,t +1 = Diag (cid:16)(cid:98) h t +1 ( (cid:98) θ ) (cid:17) . Then Theorems 3.1–3.2 immediately show the consistency of (cid:98) Σ f,t +1 . Corollary 3.1.

Under the assumptions in Theorem 3.2, we have (cid:13)(cid:13)(cid:13) (cid:98) Σ f,t +1 − Σ f,t +1 (cid:13)(cid:13)(cid:13) F = O p (cid:18) √ T + (cid:114) s p p (cid:19) . In the previous section, we ﬁnd the blessing of dimensionality in the latent factor estimation.However, when it comes to estimating large volatility matrices, we still suﬀer from the curseof dimensionality. For example, sample volatility matrix estimators are inconsistent whenboth the number of assets and sample size go to inﬁnity (Bickel and Levina, 2008; Cai andLiu, 2011; Marˇcenko and Pastur, 1967). To overcome the curse of dimensionality, we imposethe sparse structure (2.7) on the idiosyncratic volatility matrix. As discussed in Section 2.3,in the stock market, the co-movement of stocks can be explained by the common factor,and the remaining idiosyncratic co-volatilities are weakly correlated. Thus, the sparsitycondition is realistic. To estimate the sparse idiosyncratic volatility matrix, we employ thePOET procedure (Fan et al., 2013) as follows. First we estimate the input idiosyncraticvolatility matrix estimator by using the non-pervasive eigen-components as follows: (cid:98) Σ u = p (cid:88) i = r +1 (cid:98) λ i (cid:98) q i (cid:98) q (cid:62) i , (cid:98) λ i ’s and eigenvectors (cid:98) q i ’s are deﬁned in (3.1). Then we apply thethresholding method to the input idiosyncratic volatility matrix estimator as follows:[ T ( (cid:98) Σ u )] ij = (cid:98) Σ u,ii , if i = js ij ( (cid:98) Σ u,ij ) (cid:0) | (cid:98) Σ u,ij |≥ τ T √ (cid:98) Σ u,ii (cid:98) Σ u,jj (cid:1) , if i (cid:54) = j , (4.1)where ( · ) is an indicator function and s ij ( · ) is a shrinkage function satisfying | s ij ( x ) − x | ≤ τ T (cid:113)(cid:98) Σ u,ii (cid:98) Σ u,jj . The examples of the shrink function s ij ( x ) are the soft thresholding function s ij ( x ) = x − sign ( x ) τ T (cid:113)(cid:98) Σ u,ii (cid:98) Σ u,jj and the hard thresholding function s ij ( x ) = x . Thethresholding level τ T will be given in Theorem 4.1. The working principle of the thresholdingmethod is that the co-volatility is zero if the estimated correlation is weak. This makes theestimated idiosyncratic volatility matrix estimator sparse, so the estimated idiosyncraticvolatility satisﬁes the sparse condition.With the estimated factor volatility (cid:98) Σ f,t +1 and idiosyncratic volatility matrix T ( (cid:98) Σ u ), weestimate the conditional volatility matrix as follows: (cid:98) Σ t +1 = (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) + T ( (cid:98) Σ u ) . (4.2)We call the conditional volatility matrix estimator (cid:98) Σ t +1 the P-GARCH estimator. To inves-tigate the asymptotic property of (4.2), we require the following technical conditions. Assumption 3. (a) The sample covariance estimator satisﬁes the concentration inequality, for any given a > , P (cid:40) max ij (cid:12)(cid:12)(cid:12)(cid:98) Σ ij − Σ ij (cid:12)(cid:12)(cid:12) ≥ C a (cid:114) log pT (cid:41) ≤ p − a , where C a is a constant only depending on a . b) There is a constant C such that r max i ≤ p r (cid:88) j =1 V ij ≤ C. (c) The smallest eigenvalue of Σ u stays away from , and (cid:12)(cid:12) Σ u,ij (cid:12)(cid:12) ≤ C for all i , j .(d) p = o ( T ) .Remark . Assumption 3(a) is the sub-Gaussian condition. When investigating the high-dimensional inferences, the sub-Gaussian condition is essential. Additionally, when the logreturn y t satisﬁes some sub-Gaussian property, we can obtain Assumption 3(a). Remark . Assumption 3(b) is called the incoherence condition, which is widely assumedin analyzing the low-rank matrix (Cand`es and Recht, 2009; Fan et al., 2017). The basicintuition is that the factor loading matrix V is not to be sparse. That is, the factor aﬀectsalmost all the stock returns. Thus, under the factor model, the incoherence condition isacceptable.The following theorem shows the asymptotic behaviors for the P-GARCH estimator. Theorem 4.1.

Under the model in Section 2.3, suppose that Assumptions 1–3 hold. Takethe thresholding level as τ T = C τ (cid:0)(cid:112) log p/T + (cid:112) s p /p (cid:1) for some positive constant C τ . Thenwe have (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) max = O p ( τ T ) , (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) = O p (cid:0) s p τ − qT (cid:1) , (cid:13)(cid:13)(cid:13) (cid:98) Σ t +1 − Σ t +1 (cid:13)(cid:13)(cid:13) max = O p ( τ T ) , (cid:13)(cid:13)(cid:13) (cid:98) Σ t +1 − Σ t +1 (cid:13)(cid:13)(cid:13) Σ t +1 = O p (cid:18) √ pT + s p τ − qT (cid:19) , where the relative Frobenius norm (cid:107) G (cid:107) G = p − (cid:107) G − G G − (cid:107) F for any given p × p matrices G and G . emark . Theorem 4.1 shows that the P-GARCH is the consistent estimator as long as p = o ( T ).With the realistic low-rank plus sparse structure, we can enjoy the blessing of the di-mensionality for estimating the factor volatility matrix. Moreover, using the regularizationmethod, as shown in Theorem 4.1, we can overcome the curse of dimensionality. In this section, we discuss how to measure the VaR value with the P-GARCH estimator (cid:98) Σ t +1 in (4.2). Using the plug-in method, we estimate the one-step ahead VaR as follows: (cid:100) VaR α,t +1 = − w (cid:62) y − c α (cid:113) w (cid:62) (cid:98) Σ t +1 w , (4.3)where c α is an α -quantile and y is a sample mean vector. To evaluate the VaR value,we need to determine the α -quantile value. To do this, we assume that the standardizedportfolio log returns are i.i.d. Then when the standardized portfolio log returns follow thestandard normal distribution, c α is the α -quantile of the standard normal z α . When thestandardized portfolio log returns follow the multivariate t-distribution with the degreesof freedom ν , then c α = t ν,α (cid:112) ( ν − /ν , where t ν,α is an α -quantile of the t-distributionwith the degrees of freedom ν (Glasserman et al., 2002). We call them the parametric σ -based VaR estimator. The performance of the parametric σ -based VaR estimator dependson the distribution assumption. On the other hand, to obtain distribution robust VaRestimators, we use the non-parametric sample quantile method, which we call the non-parametric σ -based VaR estimator. For example, c α is set to be (cid:100) αT (cid:101) -th smallest value of { ( w (cid:62) ( y t − y ) / ( w (cid:62) (cid:98) Σ t w ) / } Tt =1 , where (cid:100)·(cid:101) is a ceiling function. To derive convergence rateof the non-parametric σ -based VaR estimator, we require the following technical conditions. Assumption 4. (a) The estimated standardized return (cid:98) x t = w (cid:62) ( y t − y ) / ( w (cid:62) (cid:98) Σ t w ) / satisﬁes the concen- ration inequality, for any given a > , max t ≤ T P (cid:40) | (cid:98) x t − x t | ≥ C a (cid:112) log T (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33)(cid:41) ≤ T − a , where x t = w (cid:62) ( y t − µ ) / ( w (cid:62) Σ t w ) / and C a is a constant only depending on a .(b) The cumulative density function of x t and its inverse function are continuous.(c) The variance of the portfolio return r t is bounded and strictly positive.Remark . The sub-Gaussian condition Assumption 4(a) is true under some sub-Gaussiancondition for y t . Assumption 4(b) is usually imposed to study quantile estimations (Chenand Tang, 2005).Then the following theorem shows the convergence rates of VaR. Theorem 4.2.

Under the model in Section 2.3, suppose that Assumptions 1–4 hold. Then foran arbitrary portfolio weight w with the gross exposure constraint (cid:107) w (cid:107) ≤ C , the parametric σ -based VaR estimator has (cid:12)(cid:12)(cid:12) (cid:100) VaR α,t +1 − VaR α,t +1 (cid:12)(cid:12)(cid:12) = O p (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33) . Moreover, the non-parametric σ -based VaR estimator is (cid:12)(cid:12)(cid:12) (cid:100) VaR α,t +1 − VaR α,t +1 (cid:12)(cid:12)(cid:12) = O p (cid:32)(cid:112) log T (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33)(cid:33) . In this paper, we focus on investigating the eﬀect of the VaR estimator with the ﬁnancialbig data for a small portfolio. When we do not incorporate ﬁnancial big data, that is, p is ﬁnite, the absolute VaR error is not consistent for all parametric and non-parametricestimator. This is because the term (cid:112) s p /p does not converge and so the latent factorestimation error is dominant. Therefore, Theorem 4.2 supports that incorporating ﬁnancialbig data leads to the blessing in the VaR forecast by capturing common factor dynamics.19 Simulation study

In this section, we conducted Monte Carlo simulations to check the ﬁnite sample perfor-mances of the proposed P-GARCH and corresponding VaR model. The data generatingprocess is analogous to the model (2.3) and (2.4). The number of factors was chosen tobe r = 3; the mean of y t is set to be zero µ = ; and the factor loading matrix V wasrandomly sampled from the ﬁrst r right singular vector of random matrix which has theelements i.i.d. Unif(0 , f t were generated by the multivariate normaldistribution with the conditional expected volatility Σ f,t as follows: Σ f,t = Diag( h t ( θ )) and h t ( θ ) = ω + A f t − + B h t − ( θ ) , where h ( θ ) = ( I r − A − B ) − ω , ω =  . . .  , A =  . . . .

15 0 .

12 0 . . . .  , B =  . . . . .

05 0 . . .  . The idiosyncratic risk u t was generated from the multivariate normal distribution with thefollowing sparse co-volatility Σ u,ij = 0 . × . | i − j | . (5.1)We varied p from 20 to 500 and T from 500 to 10,000, and employed the QMLE procedurein Section 3.2 to estimate the GARCH parameters. We repeated the whole procedure 500times. We calculated the mean absolute error based on the 500 simulations.Table 1 reports the mean absolute error (MAE) of the QMLE estimate (cid:98) θ for various p and T . To save space, we only documented 9 parameters estimation results out of 21parameters, but the rest of the parameters show similar behavior. Table 1 shows that MAEsdecreases as T or p increases. This result supports the theoretical ﬁndings in Theorem 3.2.20able 1: Mean absolute error (MAE) of the QMLE estimate (cid:98) θ with 500 replications. To savespace, only the ﬁrst 3 elements of (cid:98) A and (cid:98) B are reported. The other results have a similarpattern. p T MAE × ω ω ω A A A B B B

20 500 0.138 0.088 0.068 5.706 9.483 12.637 14.279 18.768 21.8012000 0.060 0.047 0.042 2.816 4.343 6.830 11.338 13.448 15.1694000 0.046 0.040 0.034 2.033 3.359 5.621 9.738 10.703 14.3456000 0.039 0.032 0.031 1.736 2.887 5.436 8.871 9.450 12.53810000 0.030 0.031 0.029 1.333 2.825 4.947 8.279 8.912 12.923100 500 0.124 0.079 0.060 5.816 8.651 12.264 13.739 16.993 19.1342000 0.056 0.040 0.028 2.594 4.306 5.933 10.340 12.012 13.9124000 0.040 0.027 0.020 1.846 2.925 3.978 9.869 10.098 12.7306000 0.029 0.021 0.015 1.550 2.263 3.147 8.539 8.335 11.92410000 0.023 0.018 0.012 1.047 1.874 2.606 7.993 7.731 10.932500 500 0.118 0.079 0.055 5.788 8.539 11.931 14.059 16.716 18.4822000 0.058 0.039 0.027 2.734 4.208 5.619 10.161 11.007 13.4364000 0.039 0.027 0.019 1.786 2.916 3.948 8.975 9.451 12.5256000 0.031 0.023 0.015 1.546 2.448 3.571 8.615 8.533 11.79010000 0.024 0.018 0.012 1.057 1.764 2.456 7.932 7.477 10.843With the estimated parameter (cid:98) θ , we validated the one-step ahead volatility prediction (cid:98) Σ t +1 with several matrix norms. We condcuted the thresholding procedure for the idiosyn-cratic volatility in (4.1) with the tuning parameters C τ = 1 and s p = 1. Figure 1 draws theprediction errors of (cid:98) Σ t +1 − Σ t +1 with the Frobenius, spectral, max, and relative Frobeniusnorms. For the Frobenius and spectral norms, the errors are large at p = 500. This isbecause the Frobenius and spectral norms depend on the dimensionality p . However, forthe relative Frobenius norm, the error decreases as the dimension p increases. From thisresult, we can ﬁnd the blessing of the dimensionality. The max norm does not show anysigniﬁcant diﬀerence for diﬀerent p . This may be because the large volatility matrix has agreater chance to have a large max norm error, even though the element-wise error decreases.In all cases, the error decreases as the sample size T increases. These results support the21 T Frobenius Norm p=500p=100p=20 0 2000 4000 6000 8000 10000 T Spectral Norm T Max Norm T Relative Frobenius Norm

Figure 1: Average predicted volatility errors of (cid:98) Σ t +1 − Σ t +1 against T under the Frobenius,spectral, max, and relative Frobenius norms with p = 20 , , s = 5 and the total number of assets as p = 500. To capture the market dynamics usingonly the assets in the portfolio, we considered the CCC model (Bollerslev, 1990), BEKKwith diagonal-constraint and variance targeting (BEKK) (Engle and Kroner, 1995; Pedersenand Rahbek, 2014). Additionally, in the empirical study, we often impose the static orslow time varying covariance assumption, and under this condition, we adopted POET (Fan22 T Frobenius Norm

Hist-VolPOETBEKKCCCP-GARCH 0 2000 4000 6000 8000 10000 T Spectral Norm T Max Norm T Relative Frobenius Norm

Figure 2: Average predicted volatility errors of the portfolio volatility matrix (cid:98) Σ s,t +1 − Σ s,t +1 against T under the Frobenius, spectral, max, and relative Frobenius norms with p = 500and portfolio size 5.et al., 2013) and historical sample covariance (Hist-Vol). Figure 2 plots the average predictedvolatility errors of the portfolio volatility matrix (cid:98) Σ s,t +1 − Σ s,t +1 against the sample size T under the Frobenius, spectral, max, and relative Frobenius norms. Figure 2 illustrates thatthe P-GARCH model is superior to the other multivariate volatility matrix estimates. Thismay be because using only small portfolio data, they cannot explain the market dynamicsfully.Finally, we compared the VaR forecasting performance with the small portfolio risk anal-ysis methods. We set α as 1% and ﬁxed p = 500. The portfolio size varied from 1 to 20. We23 normal-dist (size = P-GARCH CCC BEKK port-GARCH POET Hist-Vol0 2500 5000 7500 100000.000.020.04 normal-dist (size = normal-dist (size = t-dist (size = t-dist (size = t-dist (size = T sample quant (size = T sample quant (size = T sample quant (size = Figure 3: MAE of the 1%-level VaR forecasts for p = 500 with 500 replications. From theleft, the plot shows the MAE of VaR with the parametric methods under normal and t-distribution ( ν = 6) and non-parametric method with the sample quantile with the portfoliosizes 1, 5, and 20.also calculated the small portfolio matrix estimators (cid:98) Σ s,t +1 based on the P-GARCH, CCC,BEKK, POET, and Hist-Vol for the volatility forecast. We additionally added the portfoliounivariate GARCH model (Bollerslev, 1986) for the comparison. Speciﬁcally, we appliedthe usual GARCH(1,1) model to the portfolio return. For the α -quantile, we consideredthe normal and student-t distribution with 6 degrees of freedom ( ν = 6) for the parametric24ethod, whereas we used the non-parametric method with (cid:100) αT (cid:101) -th smallest values of thestandardized portfolio log returns, as in Section 4.2. The portfolio was set to be equallyweighted. The true VaR is z α ( w (cid:62) Σ t +1 w ) / , where z α is the α -quantile for the standardnormal distribution. Figure 3 draws the MAEs of the 1%-level VaR forecasts for p = 500with 500 replications. From Figure 3, we ﬁnd that the P-GARCH outperforms the othermodels for any σ -based methods and portfolio size. In this section, we examined the one-step ahead VaR forecasting performance with theempirical data. The data comprise 18 years CRSP daily percentage log returns of S&P 500constituents from January 1, 2000, to December 31, 2017 (4,523 days). The log returnsare calculated from the percentage return of the last sale or closing bid/ask price includingdividend. We selected every ﬁrm that was ever a constituent of the S&P 500 during thisperiod and ﬁltered out the stocks which have any missing data, leaving 492 ﬁrms ( p = 492). Component Number E i gen v a l ue Scree Plot

Figure 4:

The scree plot of the sample covariance matrix of 492 stocks.

To employ the proposed P-GARCH model, the number of factors r is required. To do25his, we drew the scree plot in Figure 4. Figure 4 shows that r is less than 3. So wedetermined the possible values of r as 1, 2, and 3. This is also matched with the previousempirical studies (Chan et al., 1998; Fama and French, 1992, 1993). For the thresholdingstep, we used the global industry classiﬁcation standard (GICS) sector (Fan et al., 2016).For example, we maintained within-sector volatilities but set others to zero. The equallyweighted portfolios of size 5 and 20 are randomly sampled from the 492 stocks with 500repetitions, and the single-asset portfolio contains only one stock from 492 assets. Withthese portfolios, we predicted VaR with signiﬁcance level α = 10%, 5%, 2%, and 1% usingthe parametric and non-parametric methods, as discussed in Section 4.2. For the modelﬁtting, we employed a rolling window scheme with the window size T = 252 (1 year). Theparameter is updated every 10 days, whereas the VaR forecasting is done every day withthe updated log return data. The total forecasting number N is 4,270. The last estimatedparameters of P-GARCH(r=3) are (cid:98) θ = ( θ , vec( A ) , vec( B )) = 10 − × (54.33, 0.00, 0.00, 0.00,79.23, 143.68, 0.00, 0.00, 0.00, 20.81, 0.00, 8.61, 500.66, 705.18, 14.39, 360.23, 88.21, 0.48,0.26, 773.81, 16.58). Figure 5 depicts one sample path of the estimated one-step ahead 1%-level VaR under t-distribution with portfolio size 5. It shows predicted VaR of the P-GARCHmodel can capture the dynamics of extreme loss.To test whether the predicted VaR is correct, we used the unconditional coverage test(Kupiec, 1995) and the conditional coverage test (Christoﬀersen, 1998). We deﬁned thehit rate as the number of hitting N α divided by the number of total samples N . Thenthe unconditional coverage test is a likelihood ratio test whether the hit rate is suﬃcientlyclose to α . Under the null hypothesis, the hit rate should be an unbiased estimator of thesigniﬁcance level α (Jorion, 2000). Then the test statistic is deﬁned as follows: LR uc = − { ( N − N α ) log (1 − α ) + N α log α } + 2 (cid:26) ( N − N α ) log (cid:18) − N α N (cid:19) + N log N α N (cid:27) . The conditional coverage test checks whether the hit is consecutively occurring or not, and26

Time l og -r e t u r n ( % ) Figure 5:

The plots of daily log return and predicted one-step ahead VaR of P-GARCH model. its test statistics is deﬁned as follows: LR cc = − { ( N − N α ) log (1 − α ) + N α log α } + 2 log (cid:40)(cid:18) N N (cid:19) N (cid:18) N N (cid:19) N (cid:18) N N (cid:19) N (cid:18) N N (cid:19) N (cid:41) , where N , N , N , N are conditional frequencies deﬁned in Table 2. Speciﬁcally, N isthe number of hits when there was a hit right before, whereas N is the number of hitswhen there was no hit right before. N and N are the number of opposite cases. Hence,the summation N + N + N + N is equal to N . Kupiec (1995) and Christoﬀersen(1998) showed that under the null, LR uc and LR cc follow the chi-square distribution withdegrees of freedom 1 and 2, respectively. To obtain more robust results, we also consideredthe dynamic quantile test (Kuester et al., 2006). It generalizes the conditional coveragetest by considering the relation of the current hit and the multiple lagged information. For27xample, we denote the dummy variable H t an indicator of the current hit. Then we canhave a regression with lag L as follows: H t = β + L (cid:88) i =1 β i H t − i + ε t , and with the vector notation, we have H = β + X β + u , where is a vector of ones and X is a matrix of lagged variables. Under the null hypothesis, β should be equal to α , and β should be zero. Moreover, Engle and Manganelli (2004)showed that the test statistic (DQ) converges in distribution to the chi-square as follows: DQ = (cid:98) β (cid:62) X (cid:62) X (cid:98) β α (1 − α ) d −→ χ L +2 , where χ L +2 is the chi-square with the degrees of freedom L +2. For the dynamic quantile test,when X is H t − , . . . , H t − , we denote the test by DQ Hit . When X is H t − , . . . , H t − , (cid:100) VaR α,t ,we denote the test by DQ

VaR .Table 2: The conditional frequency table of the VaR hit events. t t − r t − < (cid:100) VaR α,t − r t − ≥ (cid:100) VaR α,t − r t < (cid:100) VaR α,t N N r t ≥ (cid:100) VaR α,t N N Total N N N We compared the out-of-sample VaR forecast performances with the small portfolio riskanalysis methods considered in Section 5. For example, we used the P-GARCH, CCC,BEKK, portfolio GARCH, Hist-Vol, and POET for the volatility prediction and the standardnormal quantile, t-quantile with degrees of freedom 6, and sample quantile for the α -quantile.Table 3 reports the averaged hit rates ( N α /N ) and the p -values of the LR uc , LR cc , DQ Hit ,28nd DQ

VaR for the portfolio sizes 1, 5, 20 and α = 1% with the normal distribution, student-tdistribution, and sample quantile. Figure 6 draws box-plots of LR cc p -values for the portfoliosizes 1, 5, and 20 and α = 1% with the normal distribution, t-distribution, and samplequantile. From Table 3 and Figure 6, we ﬁnd that when comparing the dynamic modelsand static models, the dynamic models such as the P-GARCH, BEKK, CCC, and portfolioGARCH show better performance. This indicates that the ﬁnancial market is not staticand the GARCH-type models can account for the market dynamics. When comparing thedynamic models, the P-GARCH has the highest p -values for most of the VaR tests. From thisempirical result, we can conjecture that the P-GARCH accounts for the market dynamicsby incorporating ﬁnancial big data, and this leads to better performance in VaR forecastingfor the relatively small portfolio. Finally, when comparing the quantile methods, the t-distribution shows the best performance. Moreover, the normal distribution has the smallest p -values for all the portfolio sizes. This pattern coincides with the stylized fact that the logreturn follows the conditional heavy-tailed distribution (Cont, 2001).29 a b l e : A v e r ag e h i tr a t e ( N α / N ) a nd t h e p - v a l u e s o f t h e % - l e v e l V a R t e s t s t a t i s t i c s und e rt h e n o r m a l, t - d i s tr i bu t i o n ν = nd s a m p l e q u a n t il e f o rt h e p o rt f o li o s i ze s , , a nd . M o d e l s N α / N L R u c L R cc D Q H i t D Q V a R N α / N L R u c L R cc D Q H i t D Q V a R N α / N L R u c L R cc D Q H i t D Q V a R n o r m a l - d i s t r i bu t i o n ( s i ze = ) n o r m a l - d i s t r i bu t i o n ( s i ze = ) n o r m a l - d i s t r i bu t i o n ( s i ze = ) P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . CCC . . . . . . . . . . . . . . . B E KK . . . . . . . . . . . . . . . p o r t - G A R C H . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . H i s t - V o l . . . . . . . . . . . . . . . t - d i s t r i bu t i o n ( s i ze = )t - d i s t r i bu t i o n ( s i ze = )t - d i s t r i bu t i o n ( s i ze = ) P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . CCC . . . . . . . . . . . . . . . B E KK . . . . . . . . . . . . . . . p o r t - G A R C H . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . H i s t - V o l . . . . . . . . . . . . . . . s a m p l e q u a n t il e ( s i ze = ) s a m p l e q u a n t il e ( s i ze = ) s a m p l e q u a n t il e ( s i ze = ) P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . P - G A R C H ( r = ) . . . . . . . . . . . . . . . CCC . . . . . . . . . . . . . . . B E KK . . . . . . . . . . . . . . . p o r t - G A R C H . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . P O E T ( r = ) . . . . . . . . . . . . . . . H i s t - V o l . . . . . . . . . . . . . . . .00 0.25 0.50 0.75 1.00P-GARCH(r=1)P-GARCH(r=2)P-GARCH(r=3)POET(r=1)POET(r=2)POET(r=3)CCCHist-VolBEKKport-GARCH normal-dist (size = normal-dist (size = normal-dist (size = t-dist (size = t-dist (size = t-dist (size = sample quant (size = sample quant (size = sample quant (size = Figure 6:

The box plots of p -values for LR cc with α = 1%. Each column shows three diﬀerent VaRestimation results with the normal, t-distribution ( ν = 6), and sample quantile. Each row showsfour diﬀerent VaR estimates for the portfolio sizes 1, 5, 20. In this paper, we studied the blessing of dimensionality in risk analysis by incorporating ﬁ-nancial big data and understanding how to overcome the curse of dimensionality. Speciﬁcally,31nder the latent factor model, to account for the common market dynamics, we proposedthe dynamic factor model, which is the famous GARCH form. For the factor volatility, weshowed that a large number of assets help to estimate the factor volatility more accurately.The intuition behind is that every asset contains common market information, so as the num-ber of assets increases, we have more information for the latent factor. That is, we can enjoythe blessing of dimensionality in estimating the latent common factor. This fact indicatesthat even if we analyze small portfolio risk, incorporating ﬁnancial big data can be helpful.However, when handling big data, we must inevitably face the curse of dimensionality. Weovercame the curse of dimensionality by adopting the POET method (Fan et al., 2013).Then we applied the proposed P-GARCH model to measuring VaR for the relatively smallportfolio. The numerical studies showed that P-GARCH can capture the market dynamicsbetter than the small portfolio dynamic risk models which consider only the assets in theportfolio. From these results, we can conjecture that by incorporating ﬁnancial big data, wecan better account for the market dynamics.In this paper, we only consider the factor dynamics, but there are several studies thatthe idiosyncratic volatilities are also aﬀected by the market common factors (Barigozzi andHallin, 2016; Connor et al., 2006; Herskovic et al., 2016; Rangel and Engle, 2012). Thus, itis interesting but diﬃcult to develop idiosyncratic dynamic models. We leave this for thefuture study.

References

Acerbi, C. and Tasche, D. (2002). Expected shortfall: a natural coherent alternative to valueat risk.

Economic notes , 31(2):379–388.Ahmadi-Javid, A. (2012). Entropic value-at-risk: A new coherent risk measure.

Journal ofOptimization Theory and Applications , 155(3):1105–1123.Ait-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a high32imensional factor model with high-frequency data.

Journal of Econometrics , 201(2):384–399.Amemiya, T. (1985).

Advanced econometrics . Harvard university press.Bai, J. (2003). Inferential theory for factor models of large dimensions.

Econometrica ,71(1):135–171.Bai, J. and Li, K. (2012). Statistical analysis of factor models of high dimension.

The Annalsof Statistics , 40(1):436–465.Bai, J. and Ng, S. (2013). Principal components estimation and identiﬁcation of staticfactors.

Journal of Econometrics , 176(1):18–29.Bali, T. G., Cakici, N., Yan, X., and Zhang, Z. (2005). Does idiosyncratic risk really matter?

The Journal of Finance , 60(2):905–929.Barigozzi, M. and Hallin, M. (2016). Generalized dynamic factor models and volatilities:recovering the market volatility shocks.

The Econometrics Journal , 19(1).Bauwens, L., Laurent, S., and Rombouts, J. V. (2006). Multivariate garch models: a survey.

Journal of Applied Econometrics , 21(1):79–109.Bensaid, B., Lesne, J.-P., Pages, H., and Scheinkman, J. (1992). Derivative asset pricingwith transaction costs 1.

Mathematical Finance , 2(2):63–86.Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding.

The Annalsof Statistics , 36(6):2577–2604.Billio, M. and Pelizzon, L. (2000). Value-at-risk: a multivariate switching regime approach.

Journal of Empirical Finance , 7(5):531–554.Boivin, J. and Ng, S. (2006). Are more data always better for factor analysis?

Journal ofEconometrics , 132(1):169–194. 33ollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.

Journal ofEconometrics , 31(3):307–327.Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchange rates: amultivariate generalized arch model.

Review of Economics and Statistics , 72(3):498–505.Bollerslev, T. and Wooldridge, J. M. (1992). Quasi-maximum likelihood estimation and infer-ence in dynamic models with time-varying covariances.

Econometric Reviews , 11(2):143–172.Brooks, C. and Persand, G. (2003). Volatility forecasting for risk management.

Journal ofForecasting , 22(1):1–22.Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation.

Journal of the American Statistical Association , 106(494):672–684.Cajueiro, D. O. and Tabak, B. M. (2004). The hurst exponent over time: testing the assertionthat emerging markets are becoming more eﬃcient.

Physica A: Statistical Mechanics andits Applications , 336(3-4):521–537.Cand`es, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization.

Foundations of Computational Mathematics , 9(6):717.Carhart, M. M. (1997). On persistence in mutual fund performance.

The Journal of Finance ,52(1):57–82.Chan, L. K., Karceski, J., and Lakonishok, J. (1998). The risk and return from factors.

Journal of Financial and Quantitative Analysis , 33(2):159–188.Chen, S. X. and Tang, C. Y. (2005). Nonparametric inference of value-at-risk for dependentﬁnancial returns.

Journal of Financial Econometrics , 3(2):227–255.Christoﬀersen, P. and Langlois, H. (2013). The joint dynamics of equity market factors.

Journal of Financial and Quantitative Analysis , 48(5):1371–1404.34hristoﬀersen, P. F. (1998). Evaluating interval forecasts.

International Economic Review ,pages 841–862.Comte, F. and Lieberman, O. (2003). Asymptotic theory for multivariate garch processes.

Journal of Multivariate Analysis , 84(1):61–84.Connor, G., Korajczyk, R. A., and Linton, O. (2006). The common and speciﬁc componentsof dynamic volatility.

Journal of Econometrics , 132(1):231–255.Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues.Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimen-sionality.

AMS math challenges lecture , 1(2000):32.Engle, R. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models.

Journal of Business & EconomicStatistics , 20(3):339–350.Engle, R. F. (2016). Dynamic conditional beta.

Journal of Financial Econometrics ,14(4):643–667.Engle, R. F. and Kroner, K. F. (1995). Multivariate simultaneous generalized arch.

Econo-metric Theory , 11(1):122–150.Engle, R. F., Ledoit, O., and Wolf, M. (2017). Large dynamic covariance matrices.

Journalof Business & Economic Statistics , pages 1–13.Engle, R. F. and Manganelli, S. (2004). Caviar: Conditional autoregressive value at risk byregression quantiles.

Journal of Business & Economic Statistics , 22(4):367–381.Engle, R. F., Shephard, N., and Sheppard, K. (2008). Fitting vast dimensional time-varyingcovariance models. Technical Report FIN-08-009.Fama, E. F. (1998). Market eﬃciency, long-term returns, and behavioral ﬁnance.

Journalof Financial Economics , 49(3):283–306. 35ama, E. F. and French, K. R. (1992). The cross-section of expected stock returns.

TheJournal of Finance , 47(2):427–465.Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks andbonds.

Journal of Financial Economics , 33(1):3–56.Fama, E. F. and French, K. R. (2015). A ﬁve-factor asset pricing model.

Journal of FinancialEconomics , 116(1):1–22.Fan, J., Furger, A., and Xiu, D. (2016). Incorporating global industrial classiﬁcation standardinto portfolio allocation: A simple factor-based large covariance matrix estimator withhigh-frequency data.

Journal of Business & Economic Statistics , 34(4):489–503.Fan, J. and Gu, J. (2003). Semiparametric estimation of value at risk.

The EconometricsJournal , 6(2):261–290.Fan, J. and Kim, D. (2018). Robust high-dimensional volatility matrix estimation for high-frequency factor model.

Journal of the American Statistical Association , 113(523):1268–1283.Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholdingprincipal orthogonal complements.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 75(4):603–680.Fan, J., Liu, H., and Wang, W. (2018). Large covariance estimation through elliptical factormodels.

Annals of statistics , 46(4):1383.Fan, J., Wang, W., and Zhong, Y. (2017). An eigenvector perturbation bound and itsapplication to robust covariance estimation.

The Journal of Machine Learning Research ,18(1):7608–7649.Fan, J., Wang, W., and Zhong, Y. (2019). Robust covariance estimation for approximatefactor models.

Journal of Econometrics , 208(1):5–22.36antazzini, D. (2008). Dynamic copula modelling for value at risk.

Frontiers in Finance andEconomics , 5(2):72–108.Giot, P. and Laurent, S. (2004). Modelling daily value-at-risk using realized volatility andarch type models.

Journal of Empirical Finance , 11(3):379–398.Glasserman, P., Heidelberger, P., and Shahabuddin, P. (2002). Portfolio value-at-risk withheavy-tailed risk factors.

Mathematical Finance , 12(3):239–269.Hendricks, D. (1996). Evaluation of value-at-risk models using historical data.

EconomicPolicy Review , 2(1).Herskovic, B., Kelly, B., Lustig, H., and Van Nieuwerburgh, S. (2016). The common factorin idiosyncratic volatility: Quantitative asset pricing implications.

Journal of FinancialEconomics , 119(2):249–283.Huisman, R., Koedijk, K. G., and Pownall, R. A. (1998). Var-x: Fat tails in ﬁnancial riskmanagement.

Journal of Risk , 1(1):47–61.Hull, J. and White, A. (1998). Incorporating volatility updating into the historical simulationmethod for value-at-risk.

Journal of Risk , 1(1):5–19.Jorion, P. (1996). Risk2: Measuring the risk in value at risk.

Financial Analysts Journal ,52(6):47–56.Jorion, P. (2000).

Value at risk: The New Benchmark for Managing Financial Risk . McGraw-Hill Professional Publishing.Kalnina, I. and Tewou, K. (2015). Cross-sectional dependence in idiosyncratic volatility.Kawakatsu, H. (2006). Matrix exponential garch.

Journal of Econometrics , 134(1):95–128.Kim, D. and Fan, J. (2019). Factor garch-itˆo models for high-frequency data with applicationto large volatility matrix prediction.

Journal of Econometrics , 208(2):395–417.37im, D., Liu, Y., and Wang, Y. (2018). Large volatility matrix estimation with factor-baseddiﬀusion model for high-frequency ﬁnancial data.

Bernoulli , 24(4B):3657–3682.Kuester, K., Mittnik, S., and Paolella, M. S. (2006). Value-at-risk prediction: A comparisonof alternative strategies.

Journal of Financial Econometrics , 4(1):53–89.Kupiec, P. (1995). Techniques for verifying the accuracy of risk measurement models.

TheJournal of Derivatives , 3(2).Lee, S.-W. and Hansen, B. E. (1994). Asymptotic theory for the garch (1, 1) quasi-maximumlikelihood estimator.

Econometric Theory , 10(1):29–52.Li, Q., Cheng, G., Fan, J., and Wang, Y. (2018). Embracing the blessing of dimensionalityin factor models.

Journal of the American Statistical Association , 113(521):380–389.Longin, F. and Solnik, B. (2001). Extreme correlation of international equity markets.

TheJournal of Finance , 56(2):649–676.Malkiel, B. G. (2003). The eﬃcient market hypothesis and its critics.

Journal of EconomicPerspectives , 17(1):59–82.Malkiel, B. G. and Xu, Y. (1997). Risk and return revisited.

Journal of Portfolio Manage-ment , 23(3):9.Mandelbrot, B. (1963). The variation of certain speculative prices.

The Journal of Business ,36(4):394–419.Marˇcenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues for some sets ofrandom matrices.

Mathematics of the USSR-Sbornik , 1(4):457.McNeil, A. J. and Frey, R. (2000). Estimation of tail-related risk measures for heteroscedasticﬁnancial time series: an extreme value approach.

Journal of Empirical Finance , 7(3-4):271–300. 38arayan, P. K. and Smyth, R. (2004). Is south korea’s stock market eﬃcient?

AppliedEconomics Letters , 11(11):707–710.Natarajan, K., Pachamanova, D., and Sim, M. (2008). Incorporating asymmetric distribu-tional information in robust value-at-risk optimization.

Management Science , 54(3):573–585.Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity.

Econometrica: Journal of the Econometric Society , pages 1161–1167.Pakel, C., Shephard, N., Sheppard, K., and Engle, R. F. (2017). Fitting vast dimensionaltime-varying covariance models.Patton, A. J., Ziegel, J. F., and Chen, R. (2019). Dynamic semiparametric models forexpected shortfall (and value-at-risk).

Journal of Econometrics .Pedersen, R. S. and Rahbek, A. (2014). Multivariate variance targeting in the bekk–garchmodel.

The Econometrics Journal , 17(1):24–55.Rangel, J. G. and Engle, R. F. (2012). The factor–spline–garch model for high and lowfrequency correlations.

Journal of Business & Economic Statistics , 30(1):109–124.Sadorsky, P. (2005). Stochastic volatility forecasting and risk management.

Applied FinancialEconomics , 15(2):121–135.Shiller, R. J. (1995). Aggregate income risks and hedging mechanisms.

The Quarterly Reviewof Economics and Finance , 35(2):119–152.Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a largenumber of predictors.

Journal of the American Statistical Association , 97(460):1167–1179.Van der Weide, R. (2002). Go-garch: a multivariate generalized orthogonal garch model.

Journal of Applied Econometrics , 17(5):549–564.39u, P.-T. and Shieh, S.-J. (2007). Value-at-risk analysis for long-term interest rate futures:Fat-tail and long memory in return innovations.

Journal of Empirical Finance , 14(2):248–259.Zoia, M. G., Biﬃ, P., and Nicolussi, F. (2018). Value at risk and expected shortfall basedon gram-charlier-like expansions.

Journal of Banking & Finance , 93:92–104.40

Proofs

We ﬁrst introduce some notations. For a symmetric matrix A , λ k ( A ) is the k th largesteigenvalue of A , λ max ( A ) is the largest eigenvalue, and λ min ( A ) is the smallest eigenvalue. (cid:107) · (cid:107) ∞ is the matrix (cid:96) ∞ norm. Denote A ◦ B the Hadamard (or element-wise) productand x = x ◦ x the element-wise square of a vector x . tr( A ) and vec( A ) are the traceand vectorization of A , respectively. Denote E the indicator function of the event E and e i the i th standard basis vector. To simplify the notations, we deﬁne ∂ as a par-tial derivatives: ∂ x y ( x ) = ∂y ( x ) /∂x ; ∂ x y ( x ) = ∂ y ( x ) /∂x = [ ∂y /∂x, . . . , ∂y dim( y ) /∂x ] (cid:62) ; ∂ x y ( x ) = [ ∂y/∂x , . . . , ∂y/∂x dim( x ) ]; ∂ x y ( x ) = [ ∂ y /∂x , . . . , ∂ y /∂x dim( x ) ], where the func-tions y ( x ) and y ( x ) are diﬀerentiable. C is a generic constant that may diﬀer from eachequations. A.1 Proof of Theorem 3.1

Lemma A.1.

Under the assumptions of Theorem 3.1, we have E (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ (cid:13)(cid:13)(cid:13) F = O (cid:18) p T (cid:19) . Proof of Lemma A.1.

We have (cid:98) Σ − Σ = 1 T T (cid:88) t =1 (cid:0) ( y t − y )( y t − y ) (cid:62) − Σ t (cid:1) = 1 T T (cid:88) t =1 (cid:16) V (cid:0) f t f (cid:62) t − diag( h ,t ) (cid:1) V (cid:62) + Vf t u (cid:62) t + u t f (cid:62) t V (cid:62) + u t u (cid:62) t − Σ u (cid:17) − ( y − µ )( y − µ ) (cid:62) , where y = (1 /T ) (cid:80) Tt =1 y t and h ,t = h t ( θ ). Then each element is a martingale diﬀerence,and so the Burkholder-Davis-Gundy inequality shows the statement. Proof of Theorem 3.1 . First consider (3.2). By the Davis-Kahan’s sine theorem, we41ave E (cid:13)(cid:13)(cid:13) (cid:98) V − VO (cid:13)(cid:13)(cid:13) = p E (cid:13)(cid:13)(cid:13)(cid:13) √ p (cid:98) V − √ p VO (cid:13)(cid:13)(cid:13)(cid:13) ≤ p E  (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ + Σ u (cid:13)(cid:13)(cid:13) p δ r  ≤ C (cid:18) p T + s p p (cid:19) , (A.1)where the last equality is due to Lemma A.1 and the fact that (cid:107) Σ u (cid:107) ≤ (cid:107) Σ u (cid:107) ∞ ≤ Cs p .Consider (3.3). Without of loss of generality, we assume that O is the identity matrix.Algebraic manipulations show E (cid:13)(cid:13)(cid:13)(cid:98) f t − f t (cid:13)(cid:13)(cid:13) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) p (cid:98) V (cid:62) ( y t − y ) (cid:19) − (cid:18) p V (cid:62) ( y t − µ − u t ) (cid:19) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 1 p E (cid:13)(cid:13)(cid:13) ( (cid:98) V (cid:62) y t ) − (cid:98) V (cid:62) y t ◦ (cid:98) V (cid:62) y ) + ( (cid:98) V (cid:62) y ) − ( V (cid:62) y t ) + 2 p f t ◦ V (cid:62) ( µ + u t ) + ( V (cid:62) ( µ + u t )) (cid:13)(cid:13)(cid:13) = 1 p E (cid:13)(cid:13)(cid:13) ( (cid:98) V − V ) (cid:62) y t ◦ ( (cid:98) V + V ) (cid:62) y t + 2 p f t ◦ V (cid:62) u t + ( V (cid:62) u t ) − (cid:98) V (cid:62) y t ◦ (cid:98) V (cid:62) y ) + ( (cid:98) V (cid:62) y ) + 2 p f t ◦ V (cid:62) µ + ( V (cid:62) µ ) + 2( V (cid:62) u t ◦ V (cid:62) µ ) (cid:13)(cid:13)(cid:13) ≤ Cp (cid:18) E (cid:13)(cid:13)(cid:13) ( (cid:98) V − V ) (cid:62) y t ◦ ( (cid:98) V + V ) (cid:62) y t (cid:13)(cid:13)(cid:13) + p E (cid:13)(cid:13) f t ◦ V (cid:62) u t (cid:13)(cid:13) + E (cid:13)(cid:13) ( V (cid:62) u t ) (cid:13)(cid:13) + E (cid:13)(cid:13)(cid:13) − (cid:98) V (cid:62) y t ◦ (cid:98) V (cid:62) y ) + ( (cid:98) V (cid:62) y ) + 2( V (cid:62) y t ◦ V (cid:62) µ ) − ( V (cid:62) µ ) (cid:13)(cid:13)(cid:13) (cid:19) = Cp (cid:16) (I) + (II) + (III) + (IV) (cid:17) . For (I), we have (I) = E (cid:13)(cid:13)(cid:13) ( (cid:98) V − V ) (cid:62) y t ◦ ( (cid:98) V + V ) (cid:62) y t (cid:13)(cid:13)(cid:13) ≤ E (cid:20)(cid:13)(cid:13)(cid:13) ( (cid:98) V − V ) (cid:62) y t (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ( (cid:98) V + V ) (cid:62) y t (cid:13)(cid:13)(cid:13) (cid:21) E (cid:20)(cid:13)(cid:13)(cid:13) (cid:98) V − V (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) (cid:98) V + V (cid:13)(cid:13)(cid:13) (cid:107) y t (cid:107) (cid:21) ≤ p E (cid:20)(cid:13)(cid:13)(cid:13) (cid:98) V − V (cid:13)(cid:13)(cid:13) (cid:107) y t (cid:107) (cid:21) ≤ p (cid:114) E (cid:13)(cid:13)(cid:13) (cid:98) V − V (cid:13)(cid:13)(cid:13) (cid:112) E (cid:107) y t (cid:107) ≤ C (cid:18) p T + p s p (cid:19) , where the fourth inequality is due to the H¨older’s inequality and the last inequality is dueto (A.1). For (II), we have (II) ≤ E (cid:107) f t (cid:107) E (cid:13)(cid:13) V (cid:62) u t (cid:13)(cid:13) ≤ C ( ps p ) , where the last inequality is by the fact that E (cid:107) V (cid:62) u t (cid:107) = tr( V (cid:62) Σ u V ) = O ( p s p ). For (III),we have (III) = E (cid:34) r (cid:88) i =1 ( v (cid:62) i u t ) (cid:35) = E (cid:34) r (cid:88) i =1 ( v (cid:62) i u t ) ( v (cid:62) i Σ u v i ) (cid:0) v (cid:62) i Σ u v i (cid:1) (cid:35) ≤ Cp (cid:107) Σ u (cid:107) E (cid:34) r (cid:88) i =1 ( v (cid:62) i u t ) ( v (cid:62) i Σ u v i ) (cid:35) ≤ Cs p p , where v i is the i th column of V . Similarly, we can show(IV) ≤ E (cid:13)(cid:13)(cid:13) − (cid:98) V (cid:62) y t ◦ (cid:98) Vy + V (cid:62) µ ◦ V (cid:62) y t (cid:13)(cid:13)(cid:13) + E (cid:13)(cid:13)(cid:13) ( (cid:98) V (cid:62) y ) − ( V (cid:62) µ ) (cid:13)(cid:13)(cid:13) ≤ C (cid:18) p T + p s p (cid:19) . E (cid:13)(cid:13)(cid:98) f t − f t (cid:13)(cid:13) = O (cid:0) /T + s p /p + s p /p (cid:1) . A.2 Proof of Theorem 3.2

Deﬁne L ,T ( θ ) = − T T (cid:88) t =1 r (cid:88) i =1 (cid:0) log h it ( θ ) + h − it ( θ ) h ,it (cid:1) = 1 T T (cid:88) t =1 l ,t ( θ ) ,L T ( θ ) = − T T (cid:88) t =1 r (cid:88) i =1 (cid:0) log h it ( θ ) + h − it ( θ ) f it (cid:1) = 1 T T (cid:88) t =1 l t ( θ ) , (cid:98) L T ( θ ) = − T T (cid:88) t =1 r (cid:88) i =1 (cid:16) log (cid:98) h it ( θ ) + (cid:98) h − it ( θ ) (cid:98) f it (cid:17) = 1 T T (cid:88) t =1 (cid:98) l t ( θ ) , where h ,t = h t ( θ ) and (cid:98) h t ( θ ) = ω + A (cid:98) f t − + B (cid:98) h t − ( θ ). Lemma A.2.

Under the assumptions of Theorem 3.2, let h t to be a function of θ and h , h t ( θ , h ) = B t − h + ( I r − B ) − (cid:0) I r − B t − (cid:1) ω + t − (cid:88) k =0 B k Af t − − k . Then for some true initial value h , , (cid:107) h t ( θ , h ) − h t ( θ , h , ) (cid:107) = O p (cid:0) (cid:107) B (cid:107) t − (cid:1) , | L T ( θ , h ) − L T ( θ , h , ) | = O p (cid:18) T (cid:19) . Proof of Lemma A.2.

By the sub-multiplicativity, we have (cid:107) h t ( θ , h ) − h t ( θ , h , ) (cid:107) = (cid:13)(cid:13) B t − h − B t − h , (cid:13)(cid:13) ≤ (cid:107) B (cid:107) t − (cid:107) h − h , (cid:107) = C (cid:107) B (cid:107) t − . | L T ( θ , h ) − L T ( θ , h , ) | = O p (cid:18) T (cid:19) . Lemma A.2 shows that the eﬀect of the initial value is negligible. Thus, for the mathe-matical convenience, we assume that h ( θ ) = ( I r − B ) − ω + (cid:80) ∞ k =0 B k Af − k . Then we have h t ( θ ) = ( I r − B ) − ω + (cid:80) ∞ k =0 B k Af t − − k . To obtain the consistent (cid:98) θ , it is suﬃcient toshow that (cid:98) L T ( θ ) p −→ L ,T uniformly in θ (Lemma A.3) and L ,T ( θ ) attains a unique globalmaximum at θ (Amemiya, 1985). Lemma A.3 (Uniform Convergence) . Under the assumptions of Theorem 3.2, we have (cid:98) L T ( θ ) p −→ L ,T ( θ ) uniformly in θ . Proof of Lemma A.3.

We have | (cid:98) L T ( θ ) − L ,T ( θ ) | ≤ | (cid:98) L T ( θ ) − L T ( θ ) | + | L T ( θ ) − L ,T ( θ ) | . For the simplicity, we omit the parameter θ . Consider | (cid:98) L T ( θ ) − L T ( θ ) | . We have (cid:12)(cid:12)(cid:12)(cid:98) L T ( θ ) − L T ( θ ) (cid:12)(cid:12)(cid:12) ≤ T T (cid:88) t =1 (cid:32)(cid:12)(cid:12)(cid:12) r (cid:88) i =1 log (cid:98) h it − log h it (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:98) f t − f t (cid:12)(cid:12) (cid:62) (cid:98) h − t + f t (cid:62) (cid:12)(cid:12)(cid:98) h − t − h − t (cid:12)(cid:12)(cid:33) = 1 T T (cid:88) t =1 (cid:16) (I) + (II) + (III) (cid:17) , where (cid:98) h − t is the element-wise inverse of (cid:98) h t . Recall h t ( θ ) = ( I r − B ) − ω + (cid:80) ∞ k =0 B k Af t − − k .Then by the fact that log (1 + x ) ≤ x for all x > −

1, we havesup θ (I) = sup θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r (cid:88) i =1 log (cid:32) (cid:98) h it h it (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup θ r (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ (cid:98) h it h it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) i ≤ r ω min ,i (cid:62) sup θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) k =0 B k A (cid:0) ∆ (cid:98) f t − − k (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p (1) , where ∆ (cid:98) h t = (cid:98) h t − h t , ∆ (cid:98) f t = (cid:98) f t − f t , and the last equality is due to Theorem 3.1. Similarly,we can show that (II) and (III) uniformly converges to zero. Therefore, we havesup θ | (cid:98) L T ( θ ) − L T ( θ ) | = o p (1) . Now we consider L T ( θ ) − L ,T ( θ ). By Theorem 2.1 in Newey (1991), uniform convergenceis equivalent to the pointwise convergence and stochastic equicontinuity, where the stochasticequicontinuity is satisﬁed by the Lipschitz condition (cid:12)(cid:12) ∂ θ (cid:0) L T ( θ ) − L ,T ( θ ) (cid:1)(cid:12)(cid:12) ≤ O p (1). Thus,it is enough to show1. L T ( θ ) p −→ L ,T ( θ ) pointwise in θ ;2. E (cid:2) sup θ (cid:12)(cid:12) ∂ θ (cid:0) L T ( θ ) − L ,T ( θ ) (cid:1)(cid:12)(cid:12)(cid:3) < ∞ for all θ ∈ θ .Since f t − h ,t is a martingale diﬀerence, we can show E (cid:2) ( L T ( θ ) − L ,T ( θ )) (cid:3) = o (1) . Now, we show the Lipschitz condition. We have E (cid:2) sup θ (cid:12)(cid:12) ∂ θ (cid:0) L T ( θ ) − L ,T ( θ ) (cid:1)(cid:12)(cid:12)(cid:3) = E (cid:20) sup θ (cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t =1 r (cid:88) i =1 − h − it ( f it − h ,it ) ∂ θ h it (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ C T T (cid:88) t =1 r (cid:88) i =1 E (cid:20) sup θ ( ∂ θ h it ) (cid:0) f it + h ,it (cid:1)(cid:21) = CT T (cid:88) t =1 r (cid:88) i =1 E (cid:20) θ ( ∂ θ h it ) ( h ,it ) (cid:21) ≤ CT T (cid:88) t =1 r (cid:88) i =1 E (cid:2) h ,it (cid:3) E (cid:20) sup θ ( ∂ θ h it ) (cid:21) ∞ , where the ﬁrst inequality is due to ∂ θ h it > θ ∈ Θ , and the second inequality is dueto H¨older’s inequality. Lemma A.4 (Uniqueness of θ ) . Under the assumptions of Theorem 3.2, θ ∗ = arg max θ L ,T ( θ ) is unique almost surely, and θ ∗ = θ . Moreover, (cid:98) θ p −→ θ . (A.2) Proof of Lemma A.4.

Since log x + t/x has a unique minimizer at x = t , h it ( θ ) is aunique maximizer of l ,t ( h it ) for all t . If θ is not a unique parameter to have h it ( θ ), thenthere exists θ ∗ (cid:54) = θ such that h t ( θ ∗ ) = h t ( θ ). Then { h t ( θ ∗ ) − h t ( θ ) } t ≤ T = { } t ≤ T , and (cid:16) h ( θ ∗ ) − h ( θ ) , . . . , h T ( θ ∗ ) − h T ( θ ) (cid:17) = (cid:16) ω ∗ − ω A ∗ − A B ∗ − B (cid:17)  · · · f f · · · f T − h ( θ ) h ( θ ) · · · h T − ( θ )  = (cid:16) ω ∗ − ω A ∗ − A B ∗ − B (cid:17) M = , where is a zero matrix. By Assumption 2(b), MM (cid:62) is invertible a.s., which implies (cid:16) ω ∗ − ω A ∗ − A B ∗ − B (cid:17) = . Therefore, θ ∗ = θ a.s. which is a contradiction.With the uniqueness solution result, by Lemma A.3 and Theorem 4.1.2 in Amemiya(1985), we can show (A.2). 47 emma A.5. Under the assumptions of Theorem 3.2, for all ﬁxed c ≥ , l ≤ r and θ ∈ θ , E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup θ ∂ θ (cid:98) h lt (cid:98) h lt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) c (cid:35) < ∞ , E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup θ ∂ θ (cid:98) h lt (cid:98) h lt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) c (cid:35) < ∞ , E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup θ ∂ θ (cid:98) h lt (cid:98) h lt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) c (cid:35) < ∞ . Proof of Lemma A.5.

Notice that ∂ θ (cid:98) h t = ∂ θ (cid:36) + ∞ (cid:88) k =0 B k ( ∂ θ A ) (cid:98) f t − − k + ∞ (cid:88) k =1 (cid:32) k − (cid:88) ξ =0 B ξ ( ∂ θ B ) B k − − ξ (cid:33) A (cid:98) f t − − k ≥ . where (cid:36) = ( I r − B ) − ω . For the case of θ = (cid:36) i , E (cid:2)(cid:12)(cid:12) sup θ ∂ θ (cid:98) h lt / (cid:98) h lt (cid:12)(cid:12) c (cid:3) < ∞ is trivial. Withthe case θ = A ij , ∂ θ (cid:98) h lt (cid:98) h lt ≤ e (cid:62) l (cid:80) ∞ k =0 B k ( ∂ θ A ) (cid:98) f t − − k e (cid:62) l (cid:80) ∞ k =0 B k A (cid:98) f t − − k ≤ e (cid:62) l (cid:80) ∞ k =0 B k e i e (cid:62) j (cid:98) f t − − k e (cid:62) l (cid:80) ∞ k =0 B k e i A ij e (cid:62) j (cid:98) f t − − k = 1 A ij ≤ A min ,ij , where e i is the i th standard basis vector. Thus, E (cid:2)(cid:12)(cid:12) sup θ ∂ θ (cid:98) h lt / (cid:98) h lt (cid:12)(cid:12) c (cid:3) < ∞ . For the case of θ = B ij , ∂ θ (cid:98) h lt (cid:98) h lt = e (cid:62) l (cid:80) ∞ k =1 (cid:16)(cid:80) k − ξ =0 B ξ ( ∂ θ B ) B k − − ξ (cid:17) A (cid:98) f t − − k (cid:36) l + e (cid:62) l (cid:80) ∞ k =0 B k A (cid:98) f t − − k ≤ ∞ (cid:88) k =1 e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k (cid:36) l + e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i B ij e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k = ∞ (cid:88) k =0 e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k (cid:36) l + B ij e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k = 1 B ij ∞ (cid:88) k =0 ( B ij /(cid:36) l ) e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k B ij /(cid:36) l ) e (cid:62) l (cid:16)(cid:80) k − ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:17) A (cid:98) f t − − k B ij B αij (cid:36) αl ∞ (cid:88) k =0 (cid:34) e (cid:62) l (cid:32) k − (cid:88) ξ =0 B ξ e i e (cid:62) j B k − − ξ (cid:33) A (cid:98) f t − − k (cid:35) α ≤ B ij (cid:36) αl ∞ (cid:88) k =0 (cid:104) e (cid:62) l B k A (cid:98) f t − − k (cid:105) α , where the second inequality is due to x/ (1 + x ) ≤ x α for ∀ α ∈ [0 , α = 1 /c , (cid:32) E (cid:34) sup θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ θ (cid:98) h lt (cid:98) h lt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) c (cid:35)(cid:33) c ≤ C ∞ (cid:88) k =0 (cid:107) B max (cid:107) k E (cid:104) (cid:107) (cid:98) f t − − k (cid:107) (cid:105) < ∞ , where ﬁrst inequality is due to the Minkowski’s inequality, and the last inequality follows from E (cid:2) | f it | (cid:3) ≤ C and E (cid:13)(cid:13)(cid:98) f t − f t (cid:13)(cid:13) = o (1). Similarly, we can show the higher order derivativesbounds. Lemma A.6.

Under the assumptions of Theorem 3.2, let E (cid:107) ∆ f t (cid:107) = O ( β T ) . Then we have (cid:12)(cid:12)(cid:12) ∂ θ (cid:98) L T ( θ ) − ∂ θ L ,T ( θ ) (cid:12)(cid:12)(cid:12) = O p (cid:18) β T + 1 √ T (cid:19) . Proof of Lemma A.6.

Denote (cid:98) h ,t = (cid:98) h t ( θ ) and ∂ θ (cid:98) h ,t = ∂ θ (cid:98) h t ( θ ). Since E (cid:107) ∆ f t (cid:107) = O ( β T ), we have1. E (cid:13)(cid:13)(cid:13)(cid:98) h ,t − h ,t (cid:13)(cid:13)(cid:13) = O ( β T );2. E (cid:13)(cid:13)(cid:13)(cid:98) h − ,t − h − ,t (cid:13)(cid:13)(cid:13) = O ( β T );3. E (cid:13)(cid:13)(cid:13) ∂ θ (cid:98) h ,t − ∂ θ h ,t (cid:13)(cid:13)(cid:13) = O ( β T );4. E (cid:104) f t (cid:62) ( (cid:98) h − ,t − h − ,t ) ◦ ∂ θ (cid:98) h ,t (cid:105) = O ( β T ),where the last equation is by the fact that E (cid:104) f t (cid:62) ( (cid:98) h − ,t − h − ,t ) ◦ ∂ θ (cid:98) h ,t (cid:105) E (cid:34) r (cid:88) l =1 (cid:32) h ,lt − (cid:98) h ,lt h ,lt (cid:98) h ,lt (cid:33) f lt ∂ θ (cid:98) h ,lt (cid:35) = E (cid:34) r (cid:88) l =1 (cid:32) h ,lt − (cid:98) h ,lt h ,lt (cid:98) h ,lt (cid:33) (cid:15) lt ∂ θ (cid:98) h ,lt (cid:98) h ,lt (cid:35) = E (cid:34) r (cid:88) l =1 (cid:32) h ,lt − (cid:98) h ,lt (cid:98) h ,lt (cid:33) (cid:15) lt ∂ θ (cid:98) h ,lt (cid:98) h ,lt + (cid:32) h ,lt − (cid:98) h ,lt h ,lt (cid:33) (cid:15) lt ∂ θ (cid:98) h ,lt (cid:98) h ,lt (cid:35) ≤ C E (cid:34) r (cid:88) l =1 (cid:12)(cid:12)(cid:12) h ,lt − (cid:98) h ,lt (cid:12)(cid:12)(cid:12) (cid:15) lt ∂ θ (cid:98) h ,lt (cid:98) h ,lt (cid:35) ≤ C r (cid:88) l =1 (cid:18) E (cid:20)(cid:16) h ,lt − (cid:98) h ,lt (cid:17) (cid:21)(cid:19)  E (cid:32) ∂ θ (cid:98) h ,lt (cid:98) h ,lt (cid:33)  = O ( β T ) , where the last equality is due to Lemma A.5. Thus, we have ∂ θ (cid:98) L T ( θ ) − ∂ θ L T ( θ )= − T T (cid:88) t =1 ( (cid:98) h ,t − (cid:98) f t ) (cid:62) ( (cid:98) h − ,t ◦ ∂ θ (cid:98) h ,t ) − ( h ,t − f t ) (cid:62) ( h − ,t ◦ ∂ θ h ,t )= 1 T T (cid:88) t =1 (cid:26) ( (cid:98) f t − f t ) (cid:62) ( (cid:98) h − ,t ◦ ∂ θ (cid:98) h ,t ) + f t (cid:62) ( (cid:98) h − ,t − h − ,t ) ◦ ∂ θ (cid:98) h ,t + f t (cid:62) h − ,t ◦ ( ∂ θ (cid:98) h ,t − ∂ θ h ,t ) − ( (cid:98) h − ,t − h − ,t ) (cid:62) ∂ θ (cid:98) h ,t − h − ,t (cid:62) ( ∂ θ (cid:98) h ,t − ∂ θ h ,t ) (cid:27) and E (cid:12)(cid:12)(cid:12) ∂ θ (cid:98) L T ( θ ) − ∂ θ L T ( θ ) (cid:12)(cid:12)(cid:12) ≤ CT T (cid:88) t =1 { β T + 4 β T + β T + β T + β T } = O ( β T ) . (A.3)50oreover, we have ∂ θ L T ( θ ) − ∂ θ L ,T ( θ ) = 1 T T (cid:88) t =1 ( f t − h ,t ) (cid:62) ( h − ,t ◦ ∂ θ h ,t )= 1 T T (cid:88) t =1 ( ε t − ) (cid:62) ( h − ,t ◦ ∂ θ h ,t )and E [( ∂ θ L T ( θ ) − ∂ θ L ,T ( θ )) ]= 1 T T (cid:88) t =1 tr (cid:16) ( E [ ε t ε t (cid:62) ] − (cid:62) ) E (cid:2) ( h − ,t ◦ ∂ θ h ,t )( h − ,t ◦ ∂ θ h ,t ) (cid:62) (cid:3)(cid:17) = O (cid:18) T (cid:19) , (A.4)where the last equality is due to Lemma A.5. By combining (A.3) and (A.4), we completethe proof. Lemma A.7.

Under the assumptions of Theorem 3.2, we have ∂ θ (cid:98) L T ( θ ∗ ) p −→ ∂ θ L ,T ( θ ) , and − ∂ θ L ,T ( θ ) is a positive deﬁnite matrix. Proof of Lemma A.7.

We have ∂ θ (cid:98) L T ( θ ∗ ) − ∂ θ L ,T ( θ ) = (cid:16) ∂ θ (cid:98) L T ( θ ∗ ) − ∂ θ (cid:98) L T ( θ ) (cid:17) + (cid:16) ∂ θ (cid:98) L T ( θ ) − ∂ θ L ,T ( θ ) (cid:17) = (I) + (II) . By the mean value theorem and (A.2), we have (I) = ∂ θ (cid:98) L T ( θ ∗∗ )( θ ∗ − θ ) and θ ∗ − θ ≤ (cid:12)(cid:12)(cid:98) θ − θ (cid:12)(cid:12) p −→

0, respectively. Thus, it is enough to show ∂ θ (cid:98) L T ( θ ∗∗ ) = O p (1). Denote ∂ ijk =51 θ i ∂ θ j ∂ θ j , then for all i, j, k ≤ dim( θ ), ∂ ijk (cid:98) L T ( θ )= 1 T T (cid:88) t =1 (cid:26) (cid:16)(cid:98) h t − (cid:98) f t (cid:17) (cid:62) (cid:98) h ◦− t ◦ ∂ ijk (cid:98) h t + (cid:16) (cid:98) f t − (cid:98) h t (cid:17) (cid:62) (cid:98) h − t ◦ (cid:16) ∂ i (cid:98) h t ◦ ∂ jk (cid:98) h t + ∂ j (cid:98) h t ◦ ∂ ki (cid:98) h t + ∂ k (cid:98) h t ◦ ∂ ij (cid:98) h t (cid:17) + (cid:16) (cid:98) h t − (cid:98) f t (cid:17) (cid:62) (cid:98) h − t ◦ ∂ i (cid:98) h t ◦ ∂ j (cid:98) h t ◦ ∂ k (cid:98) h t (cid:27) . By E (cid:2)(cid:98) f t (cid:98) f (cid:62) t (cid:3) < ∞ and Lemma A.5, we can show E (cid:12)(cid:12) ∂ ijk (cid:98) L T ( θ ∗∗ ) (cid:12)(cid:12) < ∞ . Thus, (I) p −→

0. Similarto the proof of Lemma A.6, we can show(II) = ∂ ij (cid:98) L T ( θ ) − ∂ ij L ,T ( θ )= ∂ ij (cid:98) L T ( θ ) − ∂ ij L T ( θ ) + ∂ ij L T ( θ ) − ∂ ij L ,T ( θ )= O p ( β T ) + O p (cid:18) √ T (cid:19) . Moreover, since h − t > and f t ’s are non-degenerate, we can show − ∂ θ L ,T ( θ ) = 1 T T (cid:88) t =1 r (cid:88) i =1 h − ,it (cid:0) ∂ θ h ,it ∂ θ h ,it (cid:62) (cid:1) (cid:31) . Proof of Theorem 3.2 . By the mean value theorem, there is θ ∗ between (cid:98) θ and θ suchthat (cid:98) θ − θ = ∂ θ (cid:98) L T ( θ ∗ ) − (cid:16) ∂ θ (cid:98) L T ( (cid:98) θ ) − ∂ θ (cid:98) L T ( θ ) (cid:17) = ∂ θ (cid:98) L T ( θ ∗ ) − (cid:16) ∂ θ L ,T ( θ ) − ∂ θ (cid:98) L T ( θ ) (cid:17) , where the last equality is by the fact that ∂ θ (cid:98) L T ( (cid:98) θ ) = ∂ θ L ,T ( θ ) = 0. Then, by Lemmas A.652nd A.7, we have (cid:98) θ − θ = O p (cid:18) β T + 1 √ T (cid:19) , and thus the result of Theorem 3.1 completes the proof. A.3 Proof of Theorem 4.1

Lemma A.8.

Under the assumptions of Theorem 4.1, we have (cid:13)(cid:13)(cid:13) (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) − VΣ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 = O p (cid:18) √ T + √ pT + s p p √ p + s p p (cid:19) . (A.5) Proof of Lemma A.8.

Without loss of generality, we assume that O is the identitymatrix. Similar to the proofs of Theorem 4.2 in Kim et al. (2018), we have (cid:13)(cid:13)(cid:13) (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) − VΣ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 ≤ (cid:13)(cid:13)(cid:13) VΣ f,t +1 (cid:16) Σ − f,t +1 (cid:98) Σ f,t +1 Σ − f,t +1 − I r (cid:17) Σ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 + (cid:13)(cid:13)(cid:13) ( (cid:98) V − V ) (cid:98) Σ f,t +1 ( (cid:98) V − V ) (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 + 2 (cid:13)(cid:13)(cid:13) V (cid:98) Σ f,t +1 ( (cid:98) V − V ) (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 = (I) + (II) + (III) . First consider (I). By the matrix inversion lemma, we have( VΣ f,t +1 V (cid:62) + Σ u ) − = Σ − u − Σ − u VΣ f,t +1 ( I r + Σ f,t +1 V (cid:62) Σ − u VΣ f,t +1 ) − Σ f,t +1 V (cid:62) Σ − u and by denoting X = Σ f,t +1 V (cid:62) Σ − u VΣ f,t +1 , (cid:13)(cid:13)(cid:13) Σ f,t +1 V (cid:62) Σ − t +1 VΣ f,t +1 (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) X − X ( I r + X ) − X (cid:13)(cid:13) = (cid:13)(cid:13) X ( I r + X ) − (cid:13)(cid:13) = (cid:13)(cid:13) I r − ( I r + X ) − (cid:13)(cid:13) . Then, since (cid:107) AB (cid:107) F ≤ (cid:107) A (cid:107)(cid:107) B (cid:107) F and (cid:13)(cid:13) Σ f,t +1 V (cid:62) Σ − t +1 (cid:13)(cid:13) ≤ √

2, we have(I) = 1 √ p (cid:13)(cid:13)(cid:13) Σ − t +1 VΣ f,t +1 (cid:16) Σ − f,t +1 (cid:98) Σ f,t +1 Σ − f,t +1 − I r (cid:17) Σ f,t +1 V (cid:62) Σ − t +1 (cid:13)(cid:13)(cid:13) F ≤ √ p (cid:13)(cid:13)(cid:13) Σ − f,t +1 (cid:98) Σ f,t +1 Σ − f,t +1 − I r (cid:13)(cid:13)(cid:13) F = 2 √ p (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:98) h it +1 ( (cid:98) θ ) − h ,it +1 (cid:12)(cid:12)(cid:12) h ,it +1  = O p (cid:18) √ pT + √ s p p (cid:19) , where the last equality is by the fact that (cid:98) h it +1 ( (cid:98) θ ) − h ,it +1 = (cid:98) h it +1 ( (cid:98) θ ) − (cid:98) h it +1 ( θ ) + (cid:98) h it +1 ( θ ) − h ,it +1 = O p (cid:18) √ T + (cid:114) s p p (cid:19) . For (II) and (III), similarly, by Theorem 3.1, we can show(II) = 1 √ p (cid:13)(cid:13)(cid:13) Σ − t +1 ( (cid:98) V − V ) (cid:98) Σ f,t +1 ( (cid:98) V − V ) (cid:62) Σ − t +1 (cid:13)(cid:13)(cid:13) F ≤ √ p (cid:13)(cid:13) Σ − t +1 (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) (cid:98) Σ f,t +1 (cid:13)(cid:13)(cid:13) F (cid:13)(cid:13)(cid:13) (cid:98) V − V (cid:13)(cid:13)(cid:13) = O p (cid:18) √ pT + s p p √ p (cid:19) and (III) = 1 √ p (cid:13)(cid:13)(cid:13) Σ − t +1 V (cid:98) Σ f,t +1 ( (cid:98) V − V ) (cid:62) Σ − t +1 (cid:13)(cid:13)(cid:13) F = √ √ p (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) Σ f,t +1 ( (cid:98) V − V ) (cid:62) Σ − t +1 (cid:13)(cid:13)(cid:13)(cid:13) F C √ p (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) Σ f,t +1 (cid:13)(cid:13)(cid:13)(cid:13) F (cid:13)(cid:13)(cid:13) (cid:98) V − V (cid:13)(cid:13)(cid:13) = O p (cid:18) √ T + s p p (cid:19) . Lemma A.9.

Under the assumptions of Theorem 4.1, suppose that (cid:107) (cid:98) Σ u − Σ u (cid:107) max = O p ( γ T ) ,and the thresholding level satisﬁes the condition τ T = Cγ T such that | (cid:98) Σ u,ij − Σ u,ij | <τ T ( (cid:98) Σ u,ii (cid:98) Σ u,jj ) / / . Then we have (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) = O p (cid:0) s p γ − qT (cid:1) . Proof of Lemma A.9.

Let τ ij = τ T ( (cid:98) Σ u,ii (cid:98) Σ u,jj ) / . Similar to the proofs of Section 3 inFan and Kim (2018), under the event | (cid:98) Σ u,ij − Σ u,ij | < τ ij / (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) ∞ = max i p (cid:88) j =1 (cid:12)(cid:12)(cid:12) s ij ( (cid:98) Σ u,ij ) | (cid:98) Σ u,ij |≥ τ ij − Σ u,ij | Σ u,ij |≥ τ ij − Σ u,ij | Σ u,ij | <τ ij (cid:12)(cid:12)(cid:12) ≤ max i p (cid:88) j =1 (cid:26) (cid:12)(cid:12)(cid:12) s ij ( (cid:98) Σ u,ij ) − Σ u,ij (cid:12)(cid:12)(cid:12) | (cid:98) Σ u,ij |≥ τ ij + | Σ u,ij | (cid:12)(cid:12)(cid:12) | (cid:98) Σ u,ij |≥ τ ij − | Σ u,ij |≥ τ ij (cid:12)(cid:12)(cid:12) + | Σ u,ij | | Σ u,ij | <τ ij (cid:27) ≤ max i p (cid:88) j =1 (cid:26) τ ij | Σ u,ij |≥ τ ij + | Σ u,ij | | Σ u,ij |≤ τ ij + | Σ u,ij | q τ − qij (cid:27) ≤ C max i p (cid:88) j =1 | Σ u,ij | q τ − qij = O (cid:0) s p γ − qT (cid:1) , where the last equality is due to the sparse condition.55 roof of Theorem 4.1 . By Assumption 3(a), we have (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ (cid:13)(cid:13)(cid:13) max = O p (cid:32)(cid:114) log pT (cid:33) . (A.6)We denote (cid:98) v i and v i the i th column vectors of (cid:98) V and V , respectively. Then by Theorem 2.1in Fan et al. (2017), r (cid:88) i =1 (cid:13)(cid:13)(cid:98) v i (cid:98) v (cid:62) i − v i v (cid:62) i (cid:13)(cid:13) max ≤ r (cid:88) i =1 (cid:8)(cid:13)(cid:13) ( (cid:98) v i − v i ) (cid:98) v (cid:62) i (cid:13)(cid:13) max + (cid:13)(cid:13) v i ( (cid:98) v i − v i ) (cid:62) (cid:13)(cid:13) max (cid:9) ≤ r (cid:88) i =1 {(cid:107) (cid:98) v i − v i (cid:107) ∞ (cid:107) (cid:98) v i − v i + v i (cid:107) ∞ + (cid:107) (cid:98) v i − v i (cid:107) ∞ (cid:107) v i (cid:107) ∞ }≤ r (cid:88) i =1 (cid:8) (cid:107) (cid:98) v i − v i (cid:107) ∞ + 2 (cid:107) (cid:98) v i − v i (cid:107) ∞ (cid:107) v i (cid:107) ∞ (cid:9) ≤ C  (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ + Σ u (cid:13)(cid:13)(cid:13) ∞ p δ r + (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ + Σ u (cid:13)(cid:13)(cid:13) ∞ pδ r  = O p (cid:32)(cid:114) log pT + s p p (cid:33) , where the last equality is due to (cid:107) A (cid:107) ∞ ≤ p (cid:107) A (cid:107) max . Thus, (cid:13)(cid:13)(cid:13) (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) − VΣ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) max ≤ r (cid:88) i =1 (cid:110)(cid:13)(cid:13)(cid:13)(cid:98) h it +1 (cid:0)(cid:98) v i (cid:98) v (cid:62) i − v i v (cid:62) i (cid:1)(cid:13)(cid:13)(cid:13) max + (cid:13)(cid:13)(cid:13) ( (cid:98) h it +1 − h ,it +1 ) v i v (cid:62) i (cid:13)(cid:13)(cid:13) max (cid:111) ≤ (cid:13)(cid:13)(cid:13)(cid:98) h t +1 − h ,t +1 + h ,t +1 (cid:13)(cid:13)(cid:13) ∞ r (cid:88) i =1 (cid:13)(cid:13)(cid:98) v i (cid:98) v (cid:62) i − v i v (cid:62) i (cid:13)(cid:13) max + (cid:13)(cid:13)(cid:13)(cid:98) h t +1 − h ,t +1 (cid:13)(cid:13)(cid:13) ∞ r (cid:88) i =1 (cid:13)(cid:13) v i v (cid:62) i (cid:13)(cid:13) max ≤ O p (cid:32)(cid:114) log pT + s p p (cid:33) + O p (cid:18) √ T + (cid:114) s p p (cid:19) . (A.7)56y combining (A.6) and (A.7), we have (cid:13)(cid:13)(cid:13) (cid:98) Σ u − Σ u (cid:13)(cid:13)(cid:13) max ≤ (cid:13)(cid:13)(cid:13) (cid:98) Σ − Σ (cid:13)(cid:13)(cid:13) max + (cid:13)(cid:13)(cid:13) (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) − VΣ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) max = O p (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33) , (A.8)which leads to γ T = C (cid:0)(cid:112) log p/T + (cid:112) s p /p (cid:1) and (cid:107) (cid:98) Σ t +1 − Σ t +1 (cid:107) max = O p ( γ T ). CombiningLemmas A.8 and A.9, and (A.8), we have (cid:13)(cid:13)(cid:13) (cid:98) Σ t +1 − Σ t +1 (cid:13)(cid:13)(cid:13) Σ t +1 ≤ (cid:13)(cid:13)(cid:13) (cid:98) V (cid:98) Σ f,t +1 (cid:98) V (cid:62) − VΣ f,t +1 V (cid:62) (cid:13)(cid:13)(cid:13) Σ t +1 + (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) Σ t +1 = O p (cid:18) √ pT + s p γ − qT (cid:19) , where the last equality is by the fact (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) Σ t +1 ≤ √ p (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) Σ − t +1 (cid:13)(cid:13) F ≤ (cid:13)(cid:13)(cid:13) T ( (cid:98) Σ u ) − Σ u (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) Σ − t +1 (cid:13)(cid:13) = O p (cid:0) s p γ − qT (cid:1) . A.4 Proof of Theorem 4.2

Proof of Theorem 4.2 . Consider the parametric VaR estimator case. By the deﬁnition ofVaR in (2.2), we have (cid:12)(cid:12)(cid:12) (cid:100)

VaR α,t +1 − VaR α,t +1 (cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12)(cid:12) c α (cid:113) w (cid:62) (cid:98) Σ t +1 w − c α (cid:112) w (cid:62) Σ t +1 w (cid:12)(cid:12)(cid:12)(cid:12) + C (cid:12)(cid:12) w (cid:62) ( y − µ ) (cid:12)(cid:12) . (cid:107) y − µ (cid:107) = O p (log p/T ) and (cid:107) y − µ (cid:107) = O p ( p/T ), we have (cid:0) w (cid:62) ( y − µ ) (cid:1) = O p (cid:18) min (cid:26) log pT , (cid:107) w (cid:107) pT (cid:27)(cid:19) . (A.9)Consider (cid:12)(cid:12)(cid:12) c α (cid:113) w (cid:62) (cid:98) Σ t +1 w − c α (cid:112) w (cid:62) Σ t +1 w (cid:12)(cid:12)(cid:12) . We have (cid:12)(cid:12)(cid:12)(cid:12) c α (cid:113) w (cid:62) (cid:98) Σ t +1 w − c α (cid:112) w (cid:62) Σ t +1 w (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) w (cid:62) (cid:16) (cid:98) Σ t +1 − Σ t +1 (cid:17) w (cid:12)(cid:12)(cid:12) ≤ C p (cid:88) i,j =1 | w i w j | (cid:13)(cid:13)(cid:13) (cid:98) Σ t +1 − Σ t +1 (cid:13)(cid:13)(cid:13) max = O p (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33) , (A.10)where the last is due to Theorem 4.1 and (cid:107) w (cid:107) ≤ C . Combining (A.9) and (A.10), we have (cid:12)(cid:12)(cid:12) (cid:100) VaR α,t +1 − VaR α,t +1 (cid:12)(cid:12)(cid:12) = O p (cid:32)(cid:114) log pT + (cid:114) s p p (cid:33) . Consider the non-parametric σ -based VaR estimator case. Deﬁne (cid:98) F T ( x ) = 1 T T (cid:88) t =1 { (cid:98) x t ≤ x } , F T ( x ) = 1 T T (cid:88) t =1 { x t ≤ x } , where x is an α -quantile value, and F ( x ) as a cumulative distribution function of x t . Thenthe expected value of the absolute diﬀerence between (cid:98) F T ( x ) and F T ( x ) is E (cid:12)(cid:12)(cid:12) (cid:98) F T ( x ) − F T ( x ) (cid:12)(cid:12)(cid:12) ≤ T T (cid:88) t =1 (cid:0) P { (cid:98) x t ≤ x, x t > x } + P { (cid:98) x t > x, x t ≤ x } (cid:1) . Let ϑ T = (cid:112) log p/T + (cid:112) s p /p . Then, since Assumption 4(a) implies | (cid:98) x t − x t | = O p ( ϑ T √ log T ),we have, for large enough C , P { (cid:98) x t ≤ x, x t > x } ≤ P { x t ≤ x + | (cid:98) x t − x t | , x t > x } P { x < x t ≤ x + | (cid:98) x t − x t |}≤ P { x < x t ≤ x + Cϑ T (cid:112) log T } + P {| (cid:98) x t − x t | > Cϑ T (cid:112) log T }≤ F ( x + Cϑ T (cid:112) log T ) − F ( x ) + 1 √ T = O (cid:18) ϑ T (cid:112) log T + 1 √ T (cid:19) , where the last equality is due to Assumption 4(b). Similarly, the bound for P { (cid:98) x t > x, x t ≤ x } can be found. Then (cid:12)(cid:12) (cid:98) F T ( x ) − F T ( x ) (cid:12)(cid:12) = O p (cid:0) ϑ T √ log T + 1 / √ T (cid:1) . By the Dvoretzky-Kiefer-Wolfowitz inequality, we have (cid:12)(cid:12) F T ( x ) − F ( x ) (cid:12)(cid:12) = O p (cid:0) / √ T (cid:1) . Thus, (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) (cid:98) F T ( x ) − F T ( x ) (cid:12)(cid:12) + (cid:12)(cid:12) F T ( x ) − F ( x ) (cid:12)(cid:12) = O p (cid:0) ϑ T (cid:112) log T + 1 / √ T (cid:1) . Deﬁne F − ( y ) = inf (cid:8) x : F ( x ) > y (cid:9) . Then the (cid:100) αT (cid:101) -th smallest value of { x t } Tt =1 is bounded as follows: (cid:98) F − T ( α ) = inf (cid:8) x : (cid:98) F T ( x ) > α (cid:9) ≤ inf (cid:8) x : F ( x ) > α + (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12)(cid:9) ≤ F − (cid:16) α + (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12)(cid:17) and (cid:98) F − T ( α ) ≥ inf (cid:8) x : F ( x ) > α − (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12)(cid:9) ≥ F − (cid:16) α − (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12)(cid:17) . (cid:12)(cid:12) (cid:98) F − T ( α ) − F − ( α ) (cid:12)(cid:12) ≤ C (cid:12)(cid:12) (cid:98) F T ( x ) − F ( x ) (cid:12)(cid:12) . (A.11)We also have w (cid:62) (cid:98) Σ t +1 w = w (cid:62) Σ t +1 w + w (cid:62) ( (cid:98) Σ t +1 − Σ t +1 ) w = w (cid:62) VΣ f,t +1 V (cid:62) w + w (cid:62) Σ u w + o p (1) ≤ max i ≤ r h it +1 (cid:13)(cid:13) V (cid:62) w (cid:13)(cid:13) + C + o p (1)= O p (1) , (A.12)where the inequality is due to the gross exposure condition and | Σ u,ij | ≤ C , and the lastequality is due to (cid:107) V (cid:62) w (cid:107) = (cid:107) (cid:80) pi =1 w i (cid:101) v i (cid:107) ≤ (cid:80) pi =1 | w i |(cid:107) (cid:101) v i (cid:107) ≤ C , where (cid:101) v i is the i th rowvector of V . Combining results from (A.9) to (A.12), we have (cid:12)(cid:12)(cid:12) (cid:100) VaR α,t +1 − VaR α,t +1 (cid:12)(cid:12)(cid:12) ≤ C (cid:113) w (cid:62) (cid:98) Σ t +1 w (cid:12)(cid:12)(cid:12) (cid:98) F − T ( α ) − F − ( α ) (cid:12)(cid:12)(cid:12) + C (cid:12)(cid:12) w (cid:62) ( y − µ ) (cid:12)(cid:12) + CF − ( α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:113) w (cid:62) (cid:98) Σ t +1 w − (cid:112) w (cid:62) Σ t +1 w (cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:16) ϑ T (cid:112) log T (cid:17) ..