A Robust Statistics Approach to Minimum Variance Portfolio Optimization
11 A Robust Statistics Approachto Minimum Variance Portfolio Optimization
Liusha Yang ∗ , Romain Couillet † , Matthew R. McKay ∗ Abstract —We study the design of portfolios under a minimumrisk criterion. The performance of the optimized portfolio relieson the accuracy of the estimated covariance matrix of theportfolio asset returns. For large portfolios, the number ofavailable market returns is often of similar order to the numberof assets, so that the sample covariance matrix performs poorly asa covariance estimator. Additionally, financial market data oftencontain outliers which, if not correctly handled, may furthercorrupt the covariance estimation. We address these shortcom-ings by studying the performance of a hybrid covariance matrixestimator based on Tyler’s robust M-estimator and on Ledoit-Wolf’s shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrixtheory, we develop a consistent estimator of (a scaled versionof) the realized portfolio risk, which is minimized by optimizingonline the shrinkage intensity. Our portfolio optimization methodis shown via simulations to outperform existing methods both forsynthetic and real market data.
I. I
NTRODUCTION
The theory of portfolio optimization is generally associatedwith the classical mean-variance optimization framework ofMarkowitz [1]. The pitfalls of the mean-variance analysis aremainly related to its sensitivity to the estimation error ofthe means and covariance matrix of the asset returns. It isnonetheless argued that estimates of the covariance matrixare more accurate than those of the expected returns [2, 3].Thus, many studies concentrate on improving the performanceof the global minimum variance portfolio (GMVP), whichprovides the lowest possible portfolio risk and involves onlythe covariance matrix estimate.The frequently used covariance estimator is the well-knownsample covariance matrix (SCM). However, covariance esti-mates for portfolio optimization commonly involve few his-torical observations of sometimes up to a thousand assets.In such a case, the number of independent samples n maybe small compared to the covariance matrix dimension N ,which suggests a poor performance of the SCM. The impactof the estimation error on the out-of-sample performance ofthe GMVP based on the SCM has already been analyzed in[4–7]. ∗ L. Yang and M. R. McKay are with the Department ofElectronic and Computer Engineering, Hong Kong University ofScience and Technology, Clear Water Bay, Kowloon, Hong Kong.(email:[email protected];[email protected]). † R. Couillet is with the Laboratoire de Signaux et Systmes (L2S,UMR8506), CNRS-CentraleSup´elec-Universit´e Paris-Sud, 3 rue Joliot-Curie,91192 Gif-sur-Yvette, France (email:[email protected]).Yang and McKay’s work is supported by the Hong Kong Research GrantsCouncil under grant number 16206914. Couillet’s work is supported by theERC MORE EC–120133.
In the finance literature, several approaches have beenproposed to get around the problem of the scarcity of samples.One approach is to impose some factor structure on theestimator of the covariance matrix [8, 9], which reduces thenumber of parameters to be estimated. A second approach isto use as a covariance matrix estimator a weighted average ofthe sample covariance matrix and another estimator, such asthe 1-factor covariance matrix or the identity matrix [10, 11].A third approach is a nonlinear shrinkage estimation approach[12], which modifies each eigenvalue of the SCM under theframework of Markowitz’s portfolio selection. A fourth ap-proach comprises eigenvalue clipping methods [13–15] whoseunderlying idea is to ‘clean’ the SCM by filtering noisyeigenvalues claimed to convey little valuable information. Thisapproach has also been employed recently in proposing novelvaccine design strategies for infectious diseases [16, 17], andits theoretical foundations have been examined in [18]. Afifth method employs a bootstrap-corrected estimator for theoptimal return and its asset allocation, which reduces the errorof over-prediction of the in-sample return by bootstrapping[6]. In contrast to all of these methods (which aim to improvethe covariance matrix estimate), alternative methods have alsobeen proposed which directly impose various constraints onthe portfolio weights, such as a no-shortsale constraint [3], a L norm constraint and a L norm constraint [19, 20]. Bybounding directly the portfolio-weight vector, it is demon-strated that the estimation error can be reduced, particularlywhen the portfolio size is large [19].In addition to the problem of sample deficiency, it is oftenthe case that the return observations exhibit impulsiveness andlocal loss of stationarity [21], which is not addressed by themethods mentioned above and leads to performance degrada-tion. The field of robust estimation [22–25] intends to deal withthis problem. However, classical robust covariance estimatorsgenerally require n (cid:29) N and do not perform well (or arenot even defined) when n (cid:39) N , making them unsuitablefor many modern applications. Recent works [26–32] basedon random matrix theory have therefore considered robustestimation in the n (cid:39) N regime. Two hybrid robust shrinkagecovariance matrix estimates have been proposed in parallel in[29, 30] and in [31], respectively, both of which estimatorsare built upon Tyler’s robust M-estimatior [23] and Ledoit-Wolf’s shrinkage approach [11]. In [32], the authors show, bymeans of random matrix theory, that in the large n, N regimeand under the assumption of elliptical vector observations, theestimators in [29, 30] and [31] perform essentially the sameand can be analyzed thanks to their asymptotic closeness towell-known random matrix models. Therefore, in this paper, a r X i v : . [ q -f i n . P M ] M a r we concentrate on the estimator studied in [29, 30], which wedenote by ˆ C ST (ST standing for shrinkage Tyler). Namely, forindependent samples x , ..., x n ∈ R N with zero mean, ˆ C ST is the unique solution to the fixed-point equation ˆ C ST ( ρ ) = (1 − ρ ) 1 n n (cid:88) t =1 x t x Tt N x Tt ˆ C − ( ρ ) x t + ρ I N for any ρ ∈ (max { − Nn , } , . It should be noted that theshrinkage structure even allows n < N .This paper designs a novel minimum variance portfolio op-timization strategy based on ˆ C ST with a risk-minimizing (in-stead of Frobenius norm minimizing [32]) shrinkage parameter ρ . We first characterize the out-of-sample risk of the minimumvariance portfolio with plug-in ST for all ρ within a specifiedrange. This is done by analyzing the uniform convergence ofthe achieved realized risk on ρ in the double limit regime,where N, n → ∞ , with c N = N/n → c ∈ (0 , ∞ ) . Wesubsequently provide a consistent estimator of the realizedportfolio risk (or, more precisely, a scaled version of it) thatis defined only in terms of the observed returns. Based onthis, we obtain a risk-optimized ST covariance estimator byoptimizing online over ρ , and thus our optimized portfolio.The proposed portfolio selection is shown to achieve supe-rior performance over the competing methods in [11, 31–33]in minimizing the realized portfolio risk under the GMVPframework for impulsive data. The outperformance of ourportfolio optimization strategy compared to other methods isdemonstrated through Monte Carlo simulations with ellipti-cally distributed samples, as well as with real data of historical(daily) stock returns from Hong Kong’s Hang Seng Index(HSI). Notations:
Boldface upper case letters denote matrices,boldface lower case letters denote column vectors, and stan-dard lower case letters denote scalars. ( · ) T denotes transpose. I N denotes the N × N identity matrix and N denotes an N -dimensional vector with all entries equal to one. tr[ · ] denotesthe matrix trace operator. R and C denote the real and complexfields of dimension specified by a superscript. (cid:107) · (cid:107) denotes theEuclidean norm for vectors and the spectral norm for matrices.The Dirac measure at point x is denoted by δδδ x . The orderedeigenvalues of a symmetric matrix X of size N × N aredenoted by λ ( X ) ≤ ... ≤ λ N ( X ) , and the cardinality of a set C ⊂ R is denoted by |C| . Letting U , V be symmetric N × N matrices, we write U (cid:60) V if U − V is positive semidefinite.II. D ATA MODEL AND PROBLEM FORMULATION
We consider a time series comprising x , ..., x n ∈ R N logarithmic returns of N financial assets. We assume the x t to be independent and identically distributed (i.i.d.) with x t = µµµ + √ τ t C / N y t , t = 1 , , ..., n, (1)where µµµ ∈ R N is the mean vector of the asset returns, τ t is a real, positive random variable, C N ∈ R N × N is positivedefinite and y t ∈ R N is a zero mean unitarily invariant randomvector with norm (cid:107) y t (cid:107) = N , independent of the τ i ’s. It isassumed that µµµ and C N are time-invariant over the observationperiod. Denote z t = C / N y t . The model (1) for x t embraces in particular the class of elliptical distributions, including themultivariate normal distribution, exponential distribution andthe multivariate Student-T distribution as special cases. Thismodel for x t leads to tractable and adoptable design solutionsand is a commonly used approximation of the impulsive natureof financial data [10].Let h ∈ R N denote the portfolio selection, i.e., the vectorof asset holdings in units of currency normalized by the totaloutstanding wealth, satisfying h T N = 1 . In this paper, short-selling is allowed, and thus the portfolio weights may benegative. Then the portfolio variance (or risk) over the invest-ment period of interest is defined as σ ( h ) = E [ | h T x t | ] = h T C N h [1]. Accordingly, the GMVP selection problem canbe formulated as the following quadratic optimization problemwith a linear constraint: min h σ ( h ) s . t . h T N = 1 . This has the well-known solution h GMVP = C − N N TN C − N N and the corresponding portfolio risk is σ ( h GMVP ) = 1 TN C − N N . (2)Here, (2) represents the theoretical minimum portfolio riskbound, attained upon knowing the covariance matrix C N exactly. In practice, C N is unknown, and instead we form anestimate, denoted by ˆ C N . Thus, the GMVP selection basedon the plug-in estimator ˆ C N is given by ˆ h GMVP = ˆ C − N N TN ˆ C − N N . The quality of ˆ h GMVP , implemented based on the in-samplecovariance prediction ˆ C N , can be measured by its achievedout-of-sample (or “realized”) portfolio risk: σ (cid:16) ˆ h GMVP (cid:17) = TN ˆ C − N C N ˆ C − N N ( TN ˆ C − N N ) . The goal is to construct a good estimator ˆ C N , and conse-quently ˆ h GMVP , which minimizes this quantity.Note that, for the naive uniform diversification rule, h = N N . This is equivalent to setting ˆ C N = I N , and yields therealized portfolio risk: TN C N N N . Interestingly, this extremelysimple strategy has been shown in [34] to outperform numer-ous optimized models and will serve as a benchmark in ourwork.III. N OVEL COVARIANCE ESTIMATOR AND PORTFOLIODESIGN FOR MINIMIZING RISK
A. Tyler’s robust M-estimator with linear shrinkage
Consider the ST covariance matrix estimate introduced in[29, 30], built upon both Tyler’s M-estimate [23] and theLedoit-Wolf shrinkage estimator [11]. This estimator accountsfor the scarcity of samples, even allowing
N > n , and exhibitsrobustness to outliers or impulsive samples, e.g., elliptically distributed data. It is defined as the unique solution to thefollowing fixed-point equation for ρ ∈ (max { , − n/N } , : ˆ C ST ( ρ ) = (1 − ρ ) 1 n n (cid:88) t =1 ˜ x t ˜ x Tt N ˜ x Tt ˆ C − ( ρ )˜ x t + ρ I N (3)where ˜ x t = x t − n (cid:80) ni =1 x i .Since with probability one the x t are linearly independent, ˆ C ST ( ρ ) is almost surely defined for each N and n [29,Theorem III.1]. The corresponding GMVP selection is ˆ h ST ( ρ ) = ˆ C − ( ρ ) N TN ˆ C − ( ρ ) N with realized portfolio risk σ (cid:16) ˆ h ST ( ρ ) (cid:17) = TN ˆ C − ( ρ ) C N ˆ C − ( ρ ) N ( TN ˆ C − ( ρ ) N ) . (4)Our goal is to optimize ρ online such that (4) is minimum.However, since (4) involves C N which is unobservable, thisequation cannot be optimized directly. Also note that the naiveapproach of simply replacing C N with ˆ C ST ( ρ ) in (4) wouldyield the so-called “in-sample risk”, which underestimates therealized portfolio risk, leading to overly-optimistic investmentdecisions [33]. We tackle this problem by obtaining a consis-tent estimator for a scaled version of the realized risk (4) as n and N go to infinity at the same rate. Contrary to classicalasymptotic theory for time series analysis and mathematicalstatistics, which typically deal with the case of N fixed and n → ∞ , a double-limiting condition is of more relevance forlarge portfolio problems, where n is comparable to N . To thisend, following [33], we first derive a deterministic asymptoticequivalent of (4) and then provide a consistent estimator basedon this. B. Deterministic equivalent of the realized portfolio risk
For our asymptotic analysis, we assume the following:
Assumption 1. a. As
N, n → ∞ , N/n = c N → c ∈ (0 , ∞ ) .b. The τ t , t = 1 , ..., n are i.i.d. τ , ..., τ n ≥ ξ a.s. for some ξ > and E [ τ ] < ∞ . c. Denoting < λ ≤ ... ≤ λ N the ordered eigenvaluesof C N , as N, n → ∞ , ν N (cid:44) N (cid:80) Ni =1 δδδ λ i satisfies ν N → ν weakly with ν (cid:54) = δδδ almost everywhere. Inaddition, lim sup N λ N < ∞ . We also introduce some further definitions, which will arisein our asymptotic analysis. For ρ ∈ (max(0 , − c − ) , , define γ the unique positive solution to (cid:90) tγρ + (1 − ρ ) t ν ( dt ) (5)and β = (cid:90) cγ t ( γρ + (1 − ρ ) t ) ν ( dt ) . For technical reasons, made explicit in the appendix, we require thequantities ˜ z t = z t − n (cid:80) ni =1 z i (cid:113) τ i τ t to have controllable norms. Thisimposes the constraint τ t ≥ ξ > which might be possible to relax atthe expense of increased mathematical complexity. The following theorem presents our first key result: a deter-ministic characterization of the asymptotic realized portfoliorisk achieved with ˆ C ST ( ρ ) . Theorem 1.
Let Assumption 1 hold. For ε ∈ (0 , min { , c − } ) ,define R ε = [ ε + max { , − c − } , . Then, as N, n → ∞ , sup ρ ∈R ε (cid:12)(cid:12)(cid:12) σ (cid:16) ˆ h ST ( ρ ) (cid:17) − ¯ σ ( ρ ) (cid:12)(cid:12)(cid:12) a . s . −→ (6) where ¯ σ ( ρ ) = γ γ − β (1 − ρ ) × TN (cid:16) − ργ C N + ρ I N (cid:17) − C N (cid:16) − ργ C N + ρ I N (cid:17) − N (cid:18) TN (cid:16) − ργ C N + ρ I N (cid:17) − N (cid:19) . Proof:
See Appendix B.
Remark 1.
In Theorem 1, the set R ε excludes the region [0 , ε + max { , − c − } ) . As we handle the uniformity of theconvergence (6), the proof of Theorem 1 requires us to work onsequences { ρ n } ∞ n =1 of ρ . It is however difficult to handle thelimit (cid:12)(cid:12)(cid:12) σ (ˆ h ST ( ρ n )) − ¯ σ ( ρ n ) (cid:12)(cid:12)(cid:12) for a sequence { ρ n } ∞ n =1 with ρ n → . This follows from the same reasoning as that in [32](see Equations (5) and (6) in Section 5.1 of [32] as well asEquation (12) in Appendix A where ρ n → ρ > is necessaryto ensure e + < ). In the subsequent results, ρ ∈ R ε is alsorequired for the same reason. Theorem 1 enables us to analyze the convergence of therealized portfolio risk in the regime of Assumption 1-a for ˆ h ST ( ρ ) . In order to calibrate the shrinkage parameter ρ foroptimum GMVP performance, only the available sample dataand certainly not the unknown C N can be used. This is theobjective of the subsequent section. C. Consistent estimation of scaled realized portfolio risk
Based on the observable data only, we can obtain anestimator of a scaled version of the realized portfolio risk, σ (ˆ h ST ( ρ )) /κ , where we define κ (cid:44) (cid:82) tν ( dt ) . We begin withthe following lemma that provides a consistent estimator of γ , scaled by /κ , which is denoted as ˆ γ sc (“sc” standing for“scaled”). Lemma 1.
Under the settings of Theorem 1, as
N, n → ∞ , sup ρ ∈R ε | ˆ γ sc − γ/κ | a . s . −→ (7) where ˆ γ sc = 11 − (1 − ρ ) c N n n (cid:88) t =1 ˜ x Tt ˆ C − ( ρ )˜ x t (cid:107) ˜ x t (cid:107) . Proof:
See Appendix C.The following theorem provides a consistent estimator of σ (ˆ h ST ( ρ )) , scaled by /κ , which is denoted as ˆ σ ( ρ ) . Thisis our second main result. ˆ σ ( ρ ) = ˆ γ sc (1 − ρ ) − (1 − ρ ) c N TN ˆ C − ( ρ ) (cid:16) ˆ C ST ( ρ ) − ρ I N (cid:17) ˆ C − ( ρ ) N ( TN ˆ C − ( ρ ) N ) . (8) Theorem 2.
Under the settings of Theorem 1, as
N, n → ∞ , sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) ˆ σ ( ρ ) − κ σ (ˆ h ST ( ρ )) (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ where ˆ σ ( ρ ) is defined in (8) at the top of the page.Proof: See Appendix D.Note that, since κ is independent of ρ , the same ρ minimizesboth σ (ˆ h ST ( ρ )) and σ (ˆ h ST ( ρ )) /κ .The following corollary of Theorem 2 is of fundamentalimportance, which demonstrates that choosing ρ to minimize ˆ σ ( ρ ) is asymptotically equivalent to minimizing the unob-servable σ (ˆ h ST ( ρ )) . Corollary 1.
Denote ρ o and ρ ∗ the minimizers of ˆ σ ( ρ ) and σ (ˆ h ST ( ρ )) over R ε , respectively. Then, under the settings ofTheorem 1 and Theorem 2, as N, n → ∞ , (cid:12)(cid:12)(cid:12) σ (cid:16) ˆ h ST ( ρ o ) (cid:17) − σ (cid:16) ˆ h ST ( ρ ∗ ) (cid:17)(cid:12)(cid:12)(cid:12) a . s . −→ . Proof:
See Appendix E.With this result, the GMVP optimization problem is nowreduced to the minimization of ˆ σ ( ρ ) , which can be donewith a simple numerical search.To summarise, given n past return observations of N assets,our proposed algorithm to construct a portfolio with minimalrisk can be described as follows: Algorithm 1
Proposed algorithm for GMVP optimization1) Compute the optimized shrinkage parameter via a numer-ical search ρ o = arg min ρ ∈ [ ε +max { , − c − N } , (cid:8) ˆ σ ( ρ ) (cid:9) .
2) Form the risk-minimizing ST estimator ˆ C o ST , the uniquesolution to ˆ C o ST = (1 − ρ o ) 1 n n (cid:88) t =1 ˜ x t ˜ x Tt N ˜ x Tt ˆ C o − ˜ x t + ρ o I N .
3) Construct the optimized portfolio ˆ h o ST = ˆ C o − N TN ˆ C o − N . IV. S
IMULATION RESULTS
We use both synthetic data and real market data to showthe performance of ˆ C o ST compared to the following competingmethods:1) ˆ C P , referred to as the Abramovich-Pascal estimate from[32];2) ˆ C C , referred to as the Chen estimate from [32]; 3) ˆ C C2 , the oracle estimator in [31], which has the samestructure as ˆ C C , but resorts to solving an approximateproblem of minimizing the Frobenius distance to find theoptimal shrinkage;4) ˆ C LW , the Ledoit-Wolf shrinkage estimator in [11];5) ˆ C R , the Rubio estimator proposed in [33], which has thesame structure as ˆ C LW , but with ρ calibrated based onthe GMVP framework, as in the present article. A. Synthetic data simulations
The synthetic data are generated i.i.d. from a multivariateStudent-T distribution, where √ τ t = (cid:112) d/χ d , d = 3 and χ d isa Chi-square random variable with d degree of freedom. Weset N = 200 . The mean vector µµµ can be set arbitrarily since itis discarded by the empirical mean, having no impact on thecovariance estimates. We assume the population covariancematrix C N is based on a one-factor return structure [35]: C N = bb T σ + Σ , where σ = 0 . . The factor loadings b ∈ R N are evenly spread between . and . . The residualvariance matrix Σ ∈ R N × N is set to be diagonal andproportional to the identity matrix: Σ = σ r I , where σ r = 0 . .Fig. 1 illustrates the performance of different estimationapproaches in terms of the realized risk, averaged over Monte Carlo simulations. The risk bound is computed by (2),the theoretical minimum portfolio risk. Compared to othermethods, our proposed estimator ˆ C o ST achieves the smallestrealized risk for both n ≤ N and n > N . We omit the realizedrisks achieved by ˆ C N = I N as they are uniformly more thanfive times as large as those achieved by the other methods.It is interesting to compare the optimized ρ of ˆ C o ST and ˆ C P . They are both solutions of (3), but with ρ optimizedunder different metrics: minimizing the risk and minimizingthe Frobenius distance, respectively. As shown in Fig. 2, theoptimal shrinkage parameter varies under different metrics.Interestingly, optimizing ρ under the risk function as opposedto the Frobenius distance leads to more aggressive shrinkage(regularization) towards the identity matrix, thus producing aportfolio allocation which is closer to the uniform allocationpolicy. B. Real market data simulations
We now investigate the out-of-sample portfolio performanceof the different estimators with the real market data. Weconsider the stocks comprising the HSI. In particular, weuse the dividend-adjusted daily closing prices downloadedfrom the Yahoo Finance database to obtain the continuouslycompounded (logarithmic) returns for the constituents ofthe HSI over L = 736 working days, from Jan. , toDec. , (excluding weekends and public holidays).As conventionally done in the financial literature, the out-of-sample evaluation is defined in terms of a rolling window
150 200 250 300 350 400 45022.533.544.555.5 x 10 −3 n (N=200) R ea li z ed r i sk ˆ C o ST ˆ C P ˆ C C ˆ C C2 ˆ C LW ˆ C R bound Fig. 1. The average realized portfolio risk of different covariance estimatorsin the GMVP framework using synthetic data.
150 200 250 300 350 400 45000.10.20.30.40.50.60.7 n (N=200) O p t i m a l ρ ˆ C o ST ˆ C P Fig. 2. The optimal shrinkage parameters of ˆ C o ST and ˆ C P in the syntheticdata simulation. method. At a particular day t , we use the previous n days (i.e.,from t − n to t − ) as the training window for covariance esti-mation and construct the portfolio selection ˆ h GMVP . We thenuse ˆ h GMVP to compute the portfolio returns in the following days. Next the window is shifted days forward andthe portfolio returns for another days are computed. Thisprocedure is repeated until the end of the data. The realizedrisk is computed conventionally as the annualized samplestandard deviation of the corresponding GMVP returns. In ourtests, different training window lengths are considered.Fig. 3 shows that the proposed ˆ C o ST achieves the smallestrealized risk. It outperforms the other methods over the entirespan of considered estimation windows. The realized riskachieved by ˆ C N = I N is also omitted here because it ismore than double as those achieved by the competing methods.When the estimation window is too long (e.g., greater than days), we observe that the performance starts to systematicallydegrade. This is presumably due to a lack of stationarity in thedata over such long durations. This highlights an interestingphenomenon worthy of further consideration, but a detailedstudy falls beyond the scope of the current contribution.When the estimation window length is , the lowest riskis achieved by ˆ C o ST . Table I presents the risks obtained by the different covariance estimators at the optimal estimationwindow length of . We also test whether the pairwisedifferences between the portfolio variance achieved by ˆ C o ST and each benchmark strategy are statistically different fromzero. Since standard hypothesis tests are not valid when returnshave tails heavier than the normal distribution or are correlatedacross time, we follow the method described in [36] and[37] and employ a studentized version of the circular blockbootstrap [38] to do the test. The p -values are computed underthe null hypothesis that the portfolio variance achieved bya particular benchmark covariance matrix estimator is equalto that achieved by ˆ C o ST . We use a block length b = 5 and base our reported p -values on bootstrap iterations.We also compute the p -values when the block lengths are b = 1 and b = 10 . The interpretation of the results doesnot change for b = 1 , b = 5 , or b = 10 . This implies that thetemporal correlations of the stock returns are weak and ouri.i.d. assumption on the data is acceptable. In the row reportingthe risks, statistically significant outperformance of ˆ C o ST overother methods is denoted by asterisks: ** denotes significanceat the . level ( p < . ) and * denotes significance atthe . level ( p < . ). It can be seen from Table I thatthe outperformance of our proposed method is statisticallysignificant, with p < . in all cases.As a further comparison to investigate the performance withfiner temporal resolution than that in Fig. 3, we carry outa rolling-window analysis on the realized risks. Under theoptimal estimation window length of , we obtain out-of-sample portfolio returns. From the start of the data, we usethe most recent out-of-sample portfolio returns to computethe (annualized) standard deviations of the GMVP. Shiftingone day forward, we repeat this procedure until the end of theportfolio returns. For each covariance matrix estimator, thisresults in risk measurements, which are then displayedin a time series plot, Fig. 4. We find that . of the time, ˆ C o ST achieves the lowest risk among all alternative methods.In addition, during the period of high volatility, that is, when < t < , ˆ C o ST exhibits the greatest outperformance.This justifies that our proposed GMVP optimization strategyis robust to market fluctuations and even possibly to outliers.V. C ONCLUSIONS
We have proposed a novel minimum-variance portfoliooptimization strategy based on a robust shrinkage covarianceestimator with a shrinkage parameter calibrated to minimizethe realized portfolio risk. Our strategy has been shown to berobust to finite-sampling effects as well as to the impulsivecharacteristics of the data. It has been demonstrated that ourapproach outperforms more standard techniques in terms ofthe realized portfolio risk, both for synthetic data and forreal historical stock returns from Hong Kong’s HSI. Althoughwe base our analysis on the assumption of the absence ofthe outliers, a recent study [39] has shown that the robustcovariance estimator ˆ C ST is resilient to arbitrary outliersby appropriately weighting good versus outlying data. Thisis somewhat confirmed by our real data tests and is worthinvestigating further. TABLE IR
EALIZED PORTFOLIO RISKS ( ANNUALIZED STANDARD DEVIATIONS ) AND THE CORRESPONDING P - VALUES UNDER DIFFERENT COVARIANCE MATRIXESTIMATORS .Dataset Statistic ˆ C o ST ˆ C P ˆ C C ˆ C C2 ˆ C LW ˆ C R I N HSI Risk (n=300) . . ∗∗ . ∗ . ∗ . ∗∗ . ∗∗ . ∗∗ p -value .
000 0 .
009 0 .
028 0 .
041 0 .
001 0 .
001 0 .
50 100 150 200 250 300 350 400 4500.0420.0440.0460.0480.050.0520.0540.0560.058 n (N=45) A nnua li z ed S t anda r d D e v i a t i on ˆ C o ST ˆ C P ˆ C C ˆ C C2 ˆ C LW ˆ C R Fig. 3. Realized portfolio risks achieved out-of-sample over days ofHSI real market data (from Jan. , to Dec. , ) by a GMVPimplemented using different covariance estimators. t (N=45,n=300) A nnua li z ed s t anda r d de v i a t i on ˆ C ST ˆ C P ˆ C C ˆ C C2 ˆ C LW ˆ C R Fig. 4. Annualized rolling-window standard deviations of the most recent out-of-sample log returns for the GMVP based on different covariance matrixestimators. Even though GMVP is not an optimal portfolio in terms ofthe Sharpe ratio or return maximization at a given level of risk,many empirical studies [40, 41] has shown that an investmentin the GMVP often yields better out-of-sample results thanother mean-variance portfolios, because of the poor estimatesof the means of the asset returns. Therefore, besides the robustestimation of the covariance matrix, it would be of interestto take into account the robust estimation of the means andfurther develop robust approaches to the various portfoliooptimization strategies that involve both the estimates of themeans and the covariance matrix of the asset returns, suchas Sharpe ratio maximization or Markowitz’s mean-variance portfolio optimization. These considerations are left to futurework. A
PPENDIX AP RELIMINARY RESULTS
In this appendix we provide some preparatory lemmas thatare essential for the proof of the main theorems. From nowon, for readability, we discard all unnecessary indices ρ whenno confusion is possible.We start by rewriting ˆ C ST in a more convenient form.Denoting ˜ z t = z t − n Z N (cid:114) ττττ t , t = 1 , , ..., n, with √ τττ = ( √ τ , ..., √ τ n ) T and Z N = [ z , ..., z n ] , after somebasic algebra, we obtain ˆ C ST = (1 − ρ ) 1 n n (cid:88) t =1 ˜ z t ˜ z Tt N ˜ z Tt ˆ C − ˜ z t + ρ I N . Denoting ˆ C ( t ) (cid:44) ˆ C ST − (1 − ρ ) n ˜ z t ˜ z Tt N ˜ z Tt ˆ C − ˜ z t and using ( A + rυυυυυυ T ) − υυυ = A − υυυ/ (1+ rυυυ T A − υυυ ) for positive definitematrix A , vector υυυ and scalar r > , we have N ˜ z Tt ˆ C − ˜ z t = N ˜ z Tt ˆ C − t ) ˜ z t − ρ ) c N N ˜ z Tt ˆ C − t ) ˜ z t N ˜ z Tt ˆ C − ˜ z t so that N ˜ z Tt ˆ C − ˜ z t = (1 − (1 − ρ ) c N ) 1 N ˜ z Tt ˆ C − t ) ˜ z t (9)and we can rewrite ˆ C ST as ˆ C ST = 1 − ρ − (1 − ρ ) c N n n (cid:88) t =1 ˜ z t ˜ z Tt N ˜ z Tt ˆ C − t ) ˜ z t + ρ I N . For t ∈ { , ..., n } , denote ˆ d t ( ρ ) (cid:44) N ˜ z Tt ˆ C − t ) ˜ z t . Thefollowing lemma gives a deterministic approximation of ˆ d t ( ρ ) ,which later helps to show that, up to scaling, ˆ C ST is somewhatsimilar to (cid:80) nt =1 z t z Tt , which is not observable. Lemma 2.
Under the settings of Theorem 1, as
N, n → ∞ , sup ρ ∈R ε max ≤ t ≤ n (cid:12)(cid:12)(cid:12) ˆ d t ( ρ ) − γ ( ρ ) (cid:12)(cid:12)(cid:12) a . s . −→ . Proof:
This is proved via a contradiction argument, whichfollows along lines similar to the proof in [32]. The main dif-ference lies in that we re-center the sample data by subtractingthe sample mean, while the samples are assumed to be zeromean in [32]. By subtracting the sample mean, the re-centered data are correlated and some √ τ t terms still remain in ˆ C ST ,which introduces new technical difficulties.Assuming (by relabelling) that ˆ d ( ρ ) ≤ ... ≤ ˆ d n ( ρ ) , wefirst prove that for any fixed (cid:96) > , ˆ d n ( ρ ) is bounded aboveby γ ( ρ ) + (cid:96) for all large n , uniformly on ρ ∈ R ε . Since U (cid:60) V ⇒ V − (cid:60) U − , for positive definite matrices U and V , we obtain ˆ d n ( ρ ) = 1 N ˜ z Tn (cid:32) − ρ − (1 − ρ ) c N n n − (cid:88) t =1 ˜ z t ˜ z Tt ˆ d t ( ρ ) + ρ I N (cid:33) − ˜ z n ≤ N ˜ z Tn (cid:32) − ρ − (1 − ρ ) c N n n − (cid:88) t =1 ˜ z t ˜ z Tt ˆ d n ( ρ ) + ρ I N (cid:33) − ˜ z n . Since ˜ z n (cid:54) = 0 with probability one, this implies ≤ N ˜ z Tn (cid:32) − ρ − (1 − ρ ) c N n n − (cid:88) t =1 ˜ z t ˜ z Tt + ˆ d n ( ρ ) ρ I N (cid:33) − ˜ z n . (10)Assume that there exists a sequence { ρ n } ∞ n =1 over which ˆ d n ( ρ n ) > γ ( ρ n ) + (cid:96) infinitely often, for some fixed (cid:96) > .Since { ρ n } ∞ n =1 is bounded, it has a limit point ρ ∈ R ε . Letus restrict ourselves to such a subsequence on which ρ n → ρ > and ˆ d n ( ρ n ) > γ ( ρ n ) + (cid:96) . On this subsequence, from(10), we have (cid:101) m N,n ≥ , where (cid:101) m N,n = N ˜ z Tj (cid:102) M N,n ˜ z n and (cid:102) M N,n = (cid:16) − ρ n − (1 − ρ n ) c N n (cid:80) nt =1 ˜ z t ˜ z Tt + ( γ ( ρ n ) + (cid:96) ) ρ n I N (cid:17) − .The quadratic form (cid:101) m N,j is amenable to large randommatrix analysis. The first step is to remove the effect of thesample mean. Denote m N,j = N z Tj M N,j z j and M N,j = (cid:16) − ρ n − (1 − ρ n ) c N n (cid:80) t (cid:54) = j z t z Tt + ( γ ( ρ n )+ (cid:96) ) ρ n I N (cid:17) − . We have inparticular: Proposition 1. As N, n → ∞ , max ≤ j ≤ n | (cid:101) m N,j − m N,j | a.s. −→ . (11) Proof:
See Appendix F.
Remark 2.
In Proposition 1, Assumption 1-b is necessary;that is, i.i.d. τ , ..., τ n ≥ ξ a.s. for some ξ > and E [ τ ] < ∞ .It guarantees that for t = 1 , ..., n , the norm of ˜ z t does not gooff to infinity, recalling that ˜ z t = z t − n Z N (cid:113) ττττ t . By Proposition 1, we have | (cid:101) m N,n − m N,n | a . s . −→ . Thisallows us to follow the proof in [32], which deals with datawith mean zero.To proceed, assume first ρ (cid:54) = 1 . From the proof of Theorem in [32], m N,n a . s . −→ − (1 − ρ ) c − ρ δ (cid:18) − ( γ ( ρ ) + (cid:96) ) ρ − (1 − ρ ) c − ρ (cid:19) (cid:44) m + , (12)where, for x < , δ ( x ) is the unique positive solution to δ ( x ) = (cid:90) t − x + t cδ ( x ) ν ( dt ) . Together with | (cid:101) m N,n − m N,n | a . s . −→ , we have (cid:12)(cid:12) (cid:101) m N,n − m + (cid:12)(cid:12) a . s . −→ . (13)It was demonstrated in [32] that m + < . But this is incontradiction with (cid:101) m N,n ≥ .Now assume ρ = 1 . According to [32], m N,n a . s . −→
11 + (cid:96) < . Then (cid:12)(cid:12)(cid:12)(cid:12) (cid:101) m N,n −
11 + (cid:96) (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ , but (cid:96) < , again raising a contradiction with (cid:101) m N,n ≥ .Hence, for all large n , there is no converging subsequenceof ρ n (and thus no subsequence of ρ n ) for which ˆ d n ( ρ n ) >γ ( ρ n ) + (cid:96) infinitely often. Therefore ˆ d n ( ρ ) ≤ γ ( ρ ) + (cid:96) for alllarge n a.s., uniformly on ρ ∈ R ε .The same reasoning holds for ˆ d ( ρ ) , which can be provedgreater than γ ( ρ ) − (cid:96) for all large n uniformly on ρ ∈ R ε .Following the same arguments in [32], since (cid:96) > isarbitrary, from the ordering of the ˆ d t ( ρ ) , we have proved that sup ρ ∈R ε max ≤ t ≤ n (cid:12)(cid:12)(cid:12) ˆ d t ( ρ ) − γ ( ρ ) (cid:12)(cid:12)(cid:12) a . s . −→ . (cid:4) The following three lemmas, Lemma 3, 4 and 5 showthat functionals of Tyler’s estimator asymptotically performsimilar to functionals of n (cid:80) nt =1 z t z Tt or n (cid:80) nt =1 y t y Tt . Theyare used as an intermediate step for the development of theasymptotic deterministic equivalent of the risk function. Usingexisting results in [33], quoted as Lemma 6 in this paper, wecan then obtain our main theorems.For notational convenience, we denote k = k ( ρ ) (cid:44) − ρ − (1 − ρ ) c . Also recall that γ is the unique positive solutionto (cid:82) tγρ +(1 − ρ ) t ν ( dt ) . Assuming A N ∈ R N × N is adeterministic symmetric nonnegative definite matrix, for some η > , define D = (cid:40) [0 , ∞ ) if lim inf N λ ( A N ) > η, ∞ ) otherwise , andfurther define that, for ρ ∈ R ε and w ∈ D , (cid:101) R N = (cid:32) A N + (1 − ρ ) 1 n n (cid:88) t =1 ˜ x t ˜ x Tt N ˜ x Tt ˆ C − ˜ x t + w I N (cid:33) − (cid:101) S N = (cid:32) A N + kγ n n (cid:88) t =1 ˜ z t ˜ z Tt + w I N (cid:33) − S N = (cid:32) A N + kγ n n (cid:88) t =1 z t z Tt + w I N (cid:33) − . Then we introduce the following lemma.
Lemma 3.
Assume a N ∈ R N is a deterministic vector with lim sup N (cid:107) a N (cid:107) < ∞ . Under the settings of Theorem 1, as N, n → ∞ , sup ρ ∈R ε ,w ∈D (cid:12)(cid:12)(cid:12) a TN (cid:101) R N a N − a TN S N a N (cid:12)(cid:12)(cid:12) a . s . −→ . (14) Proof:
Define ˆ B N ( ρ ) = kγ ( ρ ) 1 n n (cid:88) t =1 ˜ z t ˜ z Tt , ˆ D N ( ρ ) = (1 − ρ ) 1 n n (cid:88) t =1 ˜ x t ˜ x Tt N ˜ x Tt ˆ C − ˜ x t ( a ) = 1 − ρ − (1 − ρ ) c N n n (cid:88) t =1 ˜ z t ˜ z Tt ˆ d t ( ρ ) where ( a ) uses the identity (9). Denote ∆ (cid:44) a TN (cid:101) R N a N − a TN (cid:101) S N a N ( a ) = a TN (cid:101) R N (cid:16) ˆ B N ( ρ ) − ˆ D N ( ρ ) (cid:17) (cid:101) S N a N where ( a ) uses the identity that U − − V − = U − ( V − U ) V − for invertible U , V matrices. We first prove that as N, n → ∞ , sup ρ ∈R ε ,w ∈D | ∆ | a . s . −→ .As N, n → ∞ , using the definition of k , sup ρ ∈R ε (cid:13)(cid:13)(cid:13) ˆ D N ( ρ ) − ˆ B N ( ρ ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n (cid:88) t =1 ˜ z t ˜ z Tt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup ρ ∈R ε max ≤ t ≤ n − ρ − (1 − ρ ) c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ d t ( ρ ) − γ ( ρ ) γ ( ρ ) ˆ d t ( ρ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (15)We will show that the RHS of (15) goes to a.s.Recalling Lemma 2, this follows upon showing that lim sup n (cid:13)(cid:13) n (cid:80) nt =1 ˜ z t ˜ z Tt (cid:13)(cid:13) < ∞ a.s. To this end, recall that ˜ z t = z t − n Z N (cid:113) ττττ t . Then n n (cid:88) t =1 ˜ z t ˜ z Tt = 1 n n (cid:88) t =1 z t z Tt − n n (cid:88) t =1 z t (cid:18) n Z N √ τττ √ τ t (cid:19) T − n n (cid:88) t =1 (cid:18) n Z N √ τττ √ τ t (cid:19) z Tt + 1 n n (cid:88) t =1 (cid:18) n Z N √ τττ √ τ t (cid:19) (cid:18) n Z N √ τττ √ τ t (cid:19) T . (16)We will show that the spectral norm of each term on the RHSof (16) is bounded for all large n a.s.First, from Assumption 1-c. and [42], we have lim sup n (cid:13)(cid:13) n (cid:80) nt =1 z t z Tt (cid:13)(cid:13) < ∞ a.s. Next, for the second andthe third terms on the RHS of (16), (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n (cid:88) t =1 z t (cid:18) n Z N √ τττ √ τ t (cid:19) T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n (cid:88) t =1 (cid:18) n Z N √ τττ √ τ t (cid:19) z Tt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) T C N (cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) ≤ (cid:107) C N (cid:107) (cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) T (cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) . By the law of large numbers, as
N, n → ∞ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) T (cid:32) n n (cid:88) t =1 y t √ τ t (cid:33) − c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . According to Assumption 1-a and Assumption 1-c, wecan see that lim sup n (cid:13)(cid:13)(cid:13)(cid:13) n (cid:80) nt =1 z t (cid:16) n Z N √ τττ √ τ t (cid:17) T (cid:13)(cid:13)(cid:13)(cid:13) =lim sup n (cid:13)(cid:13)(cid:13) n (cid:80) nt =1 (cid:16) n Z N √ τττ √ τ t (cid:17) z Tt (cid:13)(cid:13)(cid:13) ≤ c (cid:107) C N (cid:107) < ∞ a.s. For the fourth term, with Assumption 1-b, we have lim sup n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n (cid:88) t =1 (cid:18) n Z N √ τττ √ τ t (cid:19) (cid:18) n Z N √ τττ √ τ t (cid:19) T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ lim sup n (cid:40)(cid:13)(cid:13)(cid:13)(cid:13) n Z TN Z N (cid:13)(cid:13)(cid:13)(cid:13)(cid:32) n n (cid:88) t =1 τ t (cid:33)(cid:32) n n (cid:88) t =1 τ t (cid:33)(cid:41) < ∞ a.s.Therefore, lim sup n (cid:13)(cid:13) n (cid:80) nt =1 ˜ z t ˜ z Tt (cid:13)(cid:13) < ∞ a.s. Together withLemma 2, from (15), we have sup ρ ∈R ε (cid:13)(cid:13)(cid:13) ˆ B N ( ρ ) − ˆ D N ( ρ ) (cid:13)(cid:13)(cid:13) a . s . −→ . (17)Note that w ∈ D ensures lim sup N sup ρ ∈R ε ,w ∈D (cid:13)(cid:13)(cid:13) (cid:101) R N (cid:13)(cid:13)(cid:13) < ∞ and lim sup N sup ρ ∈R ε ,w ∈D (cid:13)(cid:13)(cid:13)(cid:101) S N (cid:13)(cid:13)(cid:13) < ∞ .Together with (17) and (cid:107) a N (cid:107) < ∞ , we have sup ρ ∈R ε ,w ∈D | ∆ | ≤ (cid:107) a N (cid:107) sup ρ ∈R ε ,w ∈D (cid:13)(cid:13)(cid:13) (cid:101) R N (cid:13)(cid:13)(cid:13) sup ρ ∈R ε ,w ∈D (cid:13)(cid:13)(cid:13)(cid:101) S N (cid:13)(cid:13)(cid:13) × sup ρ ∈R ε (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) kγ n n (cid:88) t =1 ˜ z t ˜ z Tt − (1 − ρ ) 1 n n (cid:88) t =1 ˜ x t ˜ x Tt N ˜ x Tt ˆ C − ˜ x t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a . s . −→ . Following the same reasoning as that of Proposition 1, wehave sup ρ ∈R ε ,w ∈D (cid:12)(cid:12)(cid:12) a TN (cid:101) S N a N − a TN S N a N (cid:12)(cid:12)(cid:12) a . s . −→ . Together with sup ρ ∈R ε ,w ∈D | ∆ | a . s . −→ , we obtain (14). (cid:4) Define W N = (cid:16) A N + kγ n (cid:80) nt =1 y t y Tt + w I N (cid:17) − and (cid:102) W N = (cid:18) A N + (1 − ρ ) n (cid:80) nt =1 ˜ y t ˜ y Tt N ˜ y Tt C / N ˆ C − C / N ˜ y t + w I N (cid:19) − ,where ˜ y t = y t − n (cid:80) ni =1 y i . We introduce the followinglemma. Lemma 4.
Under the settings of Lemma 3, as
N, n → ∞ , sup ρ ∈R ε ,w ∈D (cid:12)(cid:12)(cid:12) a TN (cid:102) W N a N − a TN W N a N (cid:12)(cid:12)(cid:12) a . s . −→ . (18) Proof:
The derivation is similar to that of (14).
Lemma 5.
Under the settings of Lemma 3 and assuming A N = , as N, n → ∞ , sup ρ ∈R ε ,w ∈ [ η, ∞ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a TN (1 − ρ ) 1 n n (cid:88) i =1 ˜ x i ˜ x Ti N ˜ x Ti ˆ C − ˜ x i (cid:101) R N a N − a TN kγ n n (cid:88) i =1 z i z Ti S N a N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . (19) Proof:
We first notice that a TN (1 − ρ ) 1 n n (cid:88) i =1 ˜ x i ˜ x Ti N ˜ x Ti ˆ C − ˜ x i (cid:101) R N a N = − dd w (cid:34) a TN (1 − ρ ) 1 n n (cid:88) i =1 ˜ x i ˜ x Ti N ˜ x Ti ˆ C − ˜ x i (cid:101) R N a N (cid:35) = − dd w (cid:16) a TN a N − w a TN (cid:101) R N a N (cid:17) = a TN (cid:101) R N a N + w dd w (cid:16) a TN (cid:101) R N a N (cid:17) . Following similar steps, we also have a TN kγ n n (cid:88) i =1 z i z Ti S N a N = a TN S N a N + w dd w (cid:0) a TN S N a N (cid:1) . The almost sure convergence (14) in Lemma 3 when extendedto w ∈ C is uniform on any bounded region of ( C − R ) ∪ D ,and the functionals of w in (14) are analytic. Thus, by theWeierstrass convergence theorem [43], the following holds: sup ρ ∈R ε ,w ∈D (cid:12)(cid:12)(cid:12)(cid:12) dd w (cid:16) a TN (cid:101) R N a N (cid:17) − dd w (cid:0) a TN S N a N (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . Together with Lemma 3, we obtain (19). (cid:4)
Lemma 6. [33, Appendix I-B] Under the settings of Lemma 3,as
N, n → ∞ , sup ρ ∈R ε ,w ∈D (cid:12)(cid:12) a TN S N a N − a TN T N a N (cid:12)(cid:12) a . s . −→ (20) where T N = (cid:16) A N + k ( γ + e N ( w ) k ) C N + w I N (cid:17) − and, foreach w ∈ D , e N ( w ) is the unique positive solution to thefollowing equation: e N ( w ) = 1 n tr (cid:34) C N (cid:18) A N + k ( γ + e N ( w ) k ) C N + w I N (cid:19) − (cid:35) . Moreover, when A N = , we have sup ρ ∈R ε ,w ∈ [ η, ∞ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a TN kγ n n (cid:88) t =1 z t z Tt S N a N − kγ ( γ + e N ( w ) k ) a TN C N T N a N − k ( γ + e N ( w ) k ) n tr [ C N T N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . (cid:4) A PPENDIX BP ROOF OF T HEOREM
N σ (ˆ h ST ) = N TN ˆ C − C N ˆ C − N ( N TN ˆ C − N ) . (21)For the denominator, Lemma 3 and Lemma 6 imply sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N TN ˆ C − N − N TN (cid:18) − ργ C N + ρ I N (cid:19) − N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . Note that in this case, A N = , a N = √ N N and w = ρ ,which leads to | e N ( ρ ) − cγ | a . s . −→ when N, n → ∞ . Thederivation is based on Assumption 1-c and the definition of γ in (5).For the numerator, we rewrite it as N TN ˆ C − C N ˆ C − N = 1 N TN C − / N ( C − / N ˆ C ST C − / N ) − C − / N N , which, upon substituting the RHS of (3) for ˆ C ST and setting A N = ρ C − N in (cid:102) W N , yields N TN ˆ C − C N ˆ C − N = 1 N TN C − / N (cid:32) (1 − ρ ) 1 n n (cid:88) t =1 C − / N ˜ x t ˜ x Tt C − / N N ˜ x Tt ˆ C − ˜ x t + ρ C − N (cid:33) − × C − / N N = − dd w (cid:20) N TN C − / N (cid:102) W N C − / N N (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) w =0 . Setting A N = ρ C − N and a TN = √ N C − / N N in (20), aswell as A N = ρ C − N in W N , yields sup ρ ∈R ε ,w ∈ [0 , ∞ ) (cid:12)(cid:12)(cid:12)(cid:12) N TN C − / N W N C − / N N − N TN C − / N J N C − / N N (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ (22)where J N = (cid:16) ρ C − N + (cid:16) k ( γ + e N ( w ) k ) + w (cid:17) I N (cid:17) − and foreach w ∈ [0 , ∞ ) , ˜ e N ( w ) is the unique positive solution tothe following equation: ˜ e N ( w ) = 1 n tr (cid:34)(cid:18) ρ C − N + (cid:18) kγ + ˜ e N ( w ) k + w (cid:19) I N (cid:19) − (cid:35) . Lemma 4 and the convergence (22) imply sup ρ ∈R ε ,w ∈ [0 , ∞ ) (cid:12)(cid:12)(cid:12)(cid:12) N TN C − / N (cid:102) W N C − / N N − N TN C − / N J N C − / N N (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . Following the same reasoning as for the proof of Lemma 5,the convergence of the derivatives holds such that at w = 0 by the Weierstrass convergence theorem, sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) dd w (cid:18) N TN C − / N (cid:102) W N C − / N N (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) w =0 − dd w (cid:18) N TN C − / N J N C − / N N (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) w =0 (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . With Eq. (23) on the top of the next page and | ˜ e N (0) − cγ | a . s . −→ when N, n → ∞ , we have sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) N TN ˆ C − C N ˆ C − N − γ γ − β (1 − ρ ) N TN × (cid:18) − ργ C N + ρ I N (cid:19) − C N (cid:18) − ργ C N + ρ I N (cid:19) − N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . (24)Equipped with the asymptotic equivalences of the denominatorand numerator of (21), we prove Theorem 1.A PPENDIX CP ROOF OF L EMMA ˆ γ sc = 11 − (1 − ρ ) c N N n n (cid:88) t =1 ˜ x Tt ˆ C − ( ρ )˜ x t N (cid:107) ˜ x t (cid:107) = 11 − (1 − ρ ) c N N n n (cid:88) t =1 ˜ z Tt ˆ C − ( ρ )˜ z t N (cid:107) ˜ z t (cid:107) . dd w (cid:20) N TN C − / N J N C − / N N (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) w =0 = − N TN C / N (cid:16) kγ +˜ e N (0) k C N + ρ I N (cid:17) − C / N N − k ( γ +˜ e N (0) k ) n tr (cid:20) C N (cid:16)(cid:16) kγ +˜ e N (0) k (cid:17) C N + ρ I N (cid:17) − (cid:21) . (23)It has been shown in Lemma 2 that sup ρ ∈R ε max ≤ t ≤ n (cid:12)(cid:12)(cid:12) ˆ d t ( ρ ) − γ ( ρ ) (cid:12)(cid:12)(cid:12) a . s . −→ , where ˆ d t ( ρ ) = − (1 − ρ ) c N N ˜ z Tt ˆ C − ( ρ )˜ z t . Therefore, to provethe convergence (7), it is left to show that N (cid:107) ˜ z t (cid:107) . s . −→ κ .We start by writing N (cid:107) ˜ z t (cid:107) = 1 N (cid:32) z Tt z t − n z Tt Z N √ τττ √ τ t − n √ τττ T √ τ t Z TN z t + 1 n √ τττ T √ τ t Z TN Z N √ τττ √ τ t (cid:33) . (25)Since the second and the third term on the RHS of (25) arethe same, we analyze the second term only. It can be rewrittenas N n z Tt Z N √ τττ √ τ t = 1 N n z Tt z t + 1 N n z Tt Z ( t ) N √ τττ ( t ) √ τ t where Z ( t ) N is the matrix with the t th column removed from Z N and √ τττ ( t ) is the vector with the t th entry removedfrom √ τττ . Since z t is independent of n Z ( t ) N √ τττ ( t ) √ τ t , we have n z Tt Z ( t ) N √ τττ ( t ) √ τ t a . s . −→ . Together with n z Tt z t = O (1) a.s., wehave N n z Tt Z N √ τττ √ τ t a . s . −→ .For the last term in (25), lim sup n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n √ τττ T √ τ t Z TN Z N √ τττ √ τ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim sup n (cid:13)(cid:13)(cid:13)(cid:13) n Z TN Z N (cid:13)(cid:13)(cid:13)(cid:13) τ t n n (cid:88) i =1 τ i < ∞ . Thus, N n √ τττ T √ τ t Z TN Z N √ τττ √ τ t a . s . −→ .Since the last three terms on the RHS of (25) vanish withlarge n , we obtain N (cid:12)(cid:12)(cid:13)(cid:13) ˜ z t (cid:13)(cid:13) − (cid:13)(cid:13) z t (cid:13)(cid:13)(cid:12)(cid:12) a . s . −→ . Therefore, as N (cid:107) z t (cid:107) . s . −→ κ , we obtain N (cid:107) ˜ z t (cid:107) . s . −→ κ and the convergence(7) unfolds. A PPENDIX DP ROOF OF T HEOREM A N = , w = ρ and a N = √ N N , the convergence (26) atthe top of the next page holds.As | e N ( ρ ) − cγ | a . s . −→ when N, n → ∞ , we substitute cγ for e N ( ρ ) in (26), giving sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) N TN ˆ C − (cid:16) ˆ C ST − ρ I N (cid:17) ˆ C − N − (1 − ρ ) − (1 − ρ ) cγ × γ γ − β (1 − ρ ) N TN C N (cid:18) − ργ C N + ρ I N (cid:19) − N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . With respect to the asymptotic equivalence in (24) and uponsubstituting ˆ γ sc for γ/κ , we obtain sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) κN TN ˆ C − C N ˆ C − N − ˆ γ sc (1 − ρ ) − (1 − ρ ) c N × N TN ˆ C − (cid:16) ˆ C ST − ρ I N (cid:17) ˆ C − N (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . Thus we obtain the consistent estimator of κ σ (ˆ h ST ( ρ )) inTheorem 2. A PPENDIX EP ROOF OF C OROLLARY sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) ˆ σ ( ρ ) − κ σ (ˆ h ST ( ρ )) (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . Then, the following holds true ˆ σ ( ρ o ) ≤ ˆ σ (ˆ h ST ( ρ ∗ ))1 κ σ (ˆ h ST ( ρ ∗ )) ≤ κ σ (ˆ h ST ( ρ o ))ˆ σ ( ρ o ) − κ σ (ˆ h ST ( ρ o )) ≤ sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) ˆ σ ( ρ ) − κ σ (ˆ h ST ( ρ )) (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ σ ( ρ ∗ ) − κ σ (ˆ h ST ( ρ ∗ )) ≤ sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12) ˆ σ ( ρ ) − κ σ (ˆ h ST ( ρ )) (cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . These four relations together ensure that | σ (ˆ h ST ( ρ o )) − σ (ˆ h ST ( ρ ∗ )) | a . s . −→ . A PPENDIX FP ROOF OF P ROPOSITION | (cid:101) m N,j − m N,j | = | − A − B + C − D | , where ≤ j ≤ n and A (cid:44) N n √ τττ T √ τ j Z TN M N,j z j B (cid:44) N z Tj M N,j n Z N √ τττ √ τ j C (cid:44) N n √ τττ T √ τ j Z TN M N,j n Z N √ τττ √ τ j D (cid:44) N ˜ z Tj (cid:102) M N,j − ρ n − (1 − ρ n ) c N n n (cid:88) t (cid:54) = j Z N √ τττ √ τ t √ τττ √ τ t T Z TN − n (cid:88) t (cid:54) = j z t √ τττ √ τ t T Z TN − n (cid:88) t (cid:54) = j Z N √ τττ √ τ t z Tt M N,j ˜ z j . sup ρ ∈R ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N TN ˆ C − (cid:16) ˆ C ST − ρ I N (cid:17) ˆ C − N − kγ ( γ + e N ( ρ ) k ) N TN C N (cid:16) k ( γ + e N ( ρ ) k ) C N + ρ I N (cid:17) − N − k ( γ + e N ( ρ ) k ) n tr (cid:20) C N (cid:16) k ( γ + e N ( ρ ) k ) C N + ρ I N (cid:17) − (cid:21) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a . s . −→ . (26) We wish to prove that E [ | (cid:101) m N,j − m N,j | p ] ≤ K p N p (27)for some constants p ≥ , where K p depends on p but noton N . Then, taking p ≥ , along with the union bound, theMarkov inequality, and the Borel-Cantelli lemma, completesthe proof of Proposition 1.Using the Minkowski inequality, we have E [ | (cid:101) m N,j − m N,j | p ] ≤ ( E /p [ | A | p ] + E /p [ | B | p ] + E /p [ | C | p ] + E /p [ | D | p ]) p . Thus, to prove (27), it is enough to show that E [ | A | p ] ≤ K pA N p , E [ | B | p ] ≤ K pB N p , E [ | C | p ] ≤ K pC N p and E [ | D | p ] ≤ K pD N p . A. Moments of | A | , | B | and | C | Start by noting that E [ | A | p ] = 1 N p n p E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j Z N √ τττ √ τ j √ τττ T √ τ j Z TN M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ = 1 N p n p E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j (cid:32) z j + Z ( j ) N √ τττ ( j ) √ τ j (cid:33) × (cid:32) z j + Z ( j ) N √ τττ ( j ) √ τ j (cid:33) T M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ = 1 N p n p E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j (cid:32) z j z Tj + z j √ τττ ( j ) T √ τ j Z ( j ) TN + Z ( j ) N √ τττ ( j ) √ τ j z Tj + Z ( j ) N √ τττ ( j ) √ τ j √ τττ ( j ) T √ τ j Z ( j ) TN (cid:33) M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ ( a ) ≤ A + A + A + A , where ( a ) follows from Jensen’s inequality and A = 4 p/ − N p n p E (cid:104)(cid:12)(cid:12) z Tj M N,j z j z Tj M N,j z j (cid:12)(cid:12) p/ (cid:105) A = 4 p/ − N p n p E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j Z ( j ) N √ τττ ( j ) √ τ j × √ τττ ( j ) T √ τ j Z ( j ) TN M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ A = 4 p/ − N p n p E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j Z ( j ) N √ τττ ( j ) √ τ j z Tj M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ A = 4 p/ − N p n p E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j z j √ τττ ( j ) T √ τ j Z ( j ) TN M N,j z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ . For term A , A = 1 N p n p · p/ − E (cid:104)(cid:12)(cid:12) z Tj M N,j z j (cid:12)(cid:12) p (cid:105) ≤ N p n p · p/ − E (cid:2) (cid:107) y j (cid:107) p (cid:107) M N,j (cid:107) p (cid:107)(cid:107) C N (cid:107) p (cid:3) . Note also that (cid:107) M N,j (cid:107) p ≤ γ ( ρ n ) + (cid:96) ) p ρ pn , and by Minkowsky’s inequality, E [ (cid:107) y j (cid:107) p ] = E ( N (cid:88) i =1 y i,j ) p ≤ N p E | y ,j | p ≤ K p N p . Thus A ≤ N p K p (cid:107) C N (cid:107) p p/ − c pN ( γ ( ρ n ) + (cid:96) ) p ρ pn ≤ K pA N p . Now consider A : A a ) ≤ p/ − N p n p (cid:18) E (cid:20)(cid:12)(cid:12)(cid:12) z Tj Q N z j − tr ( Q N ) (cid:12)(cid:12)(cid:12) p/ (cid:21) + E (cid:104) | tr ( Q N ) | p/ (cid:105)(cid:19) ( b ) ≤ K p N p n p E (cid:20)(cid:16) E p/ | z ,j | tr[ Q N Q TN ] (cid:17) p/ + E | z ,j | p tr (cid:104) ( Q N Q TN ) p/ (cid:105) + E | tr ( Q N ) | p/ (cid:105) = K p N p n p (cid:16) E p/ | z ,j | + E | z ,j | p + 1 (cid:17) E (cid:107) M N,j Z ( j ) N √ τττ ( j ) √ τ j (cid:107) p ≤ K p N p n p (cid:16) E p/ | z ,j | + E | z ,j | p + 1 (cid:17) × E (cid:32) (cid:107) M N,j (cid:107) p (cid:107) C N (cid:107) p/ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Y ( j ) N √ τττ ( j ) √ τ j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:33) ≤ N p K p K p/ C N (cid:16) E p/ | z ,j | + E | z ,j | p + 1 (cid:17) ( γ ( ρ n ) + (cid:96) ) p ρ pn E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n Y ( j ) N √ τττ ( j ) √ τ j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ K pA /N p , where Q N = M N,j Z ( j ) N √ τττ ( j ) √ τ j √ τττ ( j ) T √ τ j Z ( j ) TN M N,j , ( a ) followsfrom Jensen’s inequality and ( b ) follows from the trace lemma[44, Lemma B.26].For A , A ≤ p/ − N p n p E / (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z Tj M N,j Z ( j ) N √ τττ ( j ) √ τ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:35) E / (cid:104)(cid:12)(cid:12)(cid:12) z Tj M N,j z j (cid:12)(cid:12)(cid:12) p (cid:105) . As we have A ≤ K pA /N p and A ≤ K pA /N p , weobtain A ≤ K pA /N p . Following the same reasoning as for A , we also get A ≤ K pA /N p .Therefore, we obtain E [ | A | p ] ≤ A + A + A + A ≤ K pA /N p . D = 1 N (cid:18) z j − n Z N √ τττ √ τ j (cid:19) T (cid:102) M N,j − ρ n − (1 − ρ n ) c N n (cid:18) n √ τττ T √ τττ Z N √ τττ √ τττ T Z TN (cid:19) M N,j (cid:18) z j − n Z N √ τττ √ τ j (cid:19) D = 1 N (cid:18) z j − n Z N √ τττ √ τ j (cid:19) T (cid:102) M N,j − ρ n − (1 − ρ n ) c N n (cid:18) − n Z N √ τττ √ τττ T Z TN (cid:19) M N,j (cid:18) z j − n Z N √ τττ √ τ j (cid:19) D = 1 N (cid:18) z j − n Z N √ τττ √ τ j (cid:19) T (cid:102) M N,j − ρ n − (1 − ρ n ) c N n (cid:18) − n Z N √ τττ √ τττ T Z TN (cid:19) M N,j (cid:18) z j − n Z N √ τττ √ τ j (cid:19) . The same reasoning holds for E [ | B | p ] , giving E [ | B | p ] ≤ K pB /N p . As for the moments of | C | , it is similar to how wedealt with A : E [ | C | p ] ≤ N p E (cid:34)(cid:13)(cid:13)(cid:13)(cid:13) n Y N √ τττ √ τ j (cid:13)(cid:13)(cid:13)(cid:13) p (cid:107) C N (cid:107) p (cid:107) M pN,j (cid:107) (cid:35) ≤ (cid:107) C N (cid:107) p N p ( γ ( ρ n ) + (cid:96) ) p ρ pn E (cid:34)(cid:13)(cid:13)(cid:13)(cid:13) n Y N √ τττ √ τ j (cid:13)(cid:13)(cid:13)(cid:13) p (cid:35) ≤ K pC N p . B. Moments of | D | We denote √ τττ = ( √ τ , ..., √ τ n ) and rewrite D as D = D + D + D with D , D and D at the top of the page. We aim toprove E [ | D | p ] ≤ K pD /N p , which is achieved by prov-ing E [ | D | p ] ≤ K pD /N p , E [ | D | p ] ≤ K pD /N p and E [ | D | p ] ≤ K pD /N p .Let’s first analyze D with the analysis of D and D following similarly. We obtain E [ | D | p ] ( a ) ≤ N p (cid:18) − ρ n − (1 − ρ n ) c N (cid:19) p E / [ | D a | p ] E / [ | D b | p ] ( b ) ≤ N p (cid:18) − ρ n − (1 − ρ n ) c N (cid:19) p × ( E / p [ | D c | p ] + E / p [ | D d | p ]) p E / [ | D b | p ] (28)where ( a ) follows from the Cauchy-Schwarz inequality, ( b ) follows from Minkowsky’s inequality, and D a = (cid:18) z j − n Z N √ τττ √ τ j (cid:19) T (cid:102) M N,j n √ τττ T √ τττ Z N √ τττD b = 1 n √ τττ T Z TN M N,j (cid:18) z j − n Z N √ τττ √ τ j (cid:19) D c = z Tj (cid:102) M N,j n √ τττ T √ τττ Z N √ τττD d = (cid:18) n Z N √ τττ √ τ j (cid:19) T (cid:102) M N,j n √ τττ T √ τττ Z N √ τττ . Our aim is to prove that E [ | D b | p ] ≤ K pb , E [ | D c | p ] ≤ K pc and E [ | D d | p ] ≤ K pd . Following the analysis of E [ | A | p ] and E [ | C | p ] , we obtain E [ | D b | p ] ≤ K pb . For E [ | D d | p ] , wehave E [ | D d | p ] ≤ n p τ pj (cid:13)(cid:13)(cid:13)(cid:13) √ τττ (cid:13)(cid:13)(cid:13)(cid:13) p E (cid:34)(cid:13)(cid:13)(cid:13)(cid:102) M N,j (cid:13)(cid:13)(cid:13) p (cid:107) C N (cid:107) p (cid:13)(cid:13)(cid:13)(cid:13) n Y N √ τττ (cid:13)(cid:13)(cid:13)(cid:13) p (cid:35) ≤ (cid:107) C N (cid:107) p n p τ pj ( γ ( ρ n ) + (cid:96) ) p ρ pn (cid:13)(cid:13)(cid:13)(cid:13) √ τττ (cid:13)(cid:13)(cid:13)(cid:13) p E (cid:13)(cid:13)(cid:13)(cid:13) n Y N √ τττ (cid:13)(cid:13)(cid:13)(cid:13) p ≤ K pd . Let us now establish the inequality for D c . We can seethat z j is not independent of (cid:102) M N,j , thus we cannot followthe same procedure as for our analysis of A to determine theorder of E [ | D c | p ] . Instead, we divide (cid:102) M N,j into two parts,one that is independent of z j and the other the remainder.We first write (cid:80) t (cid:54) = j ˜ z t ˜ z Tt = E + F , where E and F aredefined at the bottom of the page. Note that E is independentof z j and F is not. Then D c can be rewritten as (29) at thetop of the next page. Using Jensen’s inequality, E [ | D c | p ] ≤ p − (cid:0) E [ | G | p ] + E [ | H | p ] (cid:1) , where G and H are the two terms on the RHS of (29). Nextwe can use the same technique as used in Appendix F-A toprove that E [ | G | p ] ≤ K pG and E [ | H | p ] ≤ K pH . Therefore,we obtain E [ | D c | p ] ≤ K pc .Thus far, we have proven that E [ | D b | p ] ≤ K pb , E [ | D c | p ] ≤ K pc , and E [ | D d | p ] ≤ K pd . Coming back to(28), we obtain E [ | D | p ] ≤ K pD /N p .Following similar arguments to our analysis of E [ | D | p ] ,we can also obtain E [ | D | p ] ≤ K pD /N p and E [ | D | p ] ≤ E = Z ( j ) N Z ( j ) N T − n Z ( j ) N √ τττ ( j ) (cid:16) Z ( j ) N (cid:112) τττ ( j ) (cid:17) T − n Z ( j ) N (cid:112) τττ ( j ) (cid:18) Z ( j ) N √ τττ (cid:19) T + 1 n (cid:18) √ τττ ( j ) (cid:19) T √ τττ ( j ) Z ( j ) N (cid:112) τττ ( j ) (cid:16) Z ( j ) N (cid:112) τττ ( j ) (cid:17) T F = − n Z ( j ) N √ τττ ( j ) √ τ j z Tj − n √ τ j z j (cid:18) Z ( j ) N √ τττ ( j ) (cid:19) T + τ j n (cid:18) √ τττ ( j ) (cid:19) T √ τττ ( j ) z j z Tj + 1 n (cid:18) √ τττ ( j ) (cid:19) T √ τττ ( j ) Z ( j ) N (cid:112) τττ ( j ) √ τ j z Tj + 1 n (cid:18) √ τττ ( j ) (cid:19) T √ τττ ( j ) √ τ j z j (cid:16) Z ( j ) N (cid:112) τττ ( j ) (cid:17) T . D c = z j (cid:18) − ρ n − (1 − ρ n ) c N n E + ( γ ( ρ n ) + (cid:96) ) ρ n I N (cid:19) − n √ τττ T √ τττ Z N √ τττ + z j (cid:102) M N,j (cid:18) − − ρ n − (1 − ρ n ) c N n F (cid:19) × (cid:18) − ρ n − (1 − ρ n ) c N n E + ( γ ( ρ n )+ (cid:96) ) ρ n I N (cid:19) − n √ τττ T √ τττ Z N √ τττ . (29) K pD /N p . As D = D + D + D , by Minkowsky’s inequality,we obtain E [ | D | p ] ≤ K pD /N p .R EFERENCES[1] H. Markowitz, “Portfolio selection,”
J. Finance , vol. 7, no. 1, pp. 77–91,Mar. 1952.[2] R. C. Merton, “On estimating the expected return on the market: Anexploratory investigation,”
J. Finan. Econ. , vol. 8, no. 4, pp. 323–361,Dec. 1980.[3] R. Jagannathan and T. Ma, “Risk reduction in large portfolios: Whyimposing the wrong constraints helps,”
J. Finance , vol. 58, no. 4, pp.1651–1684, Aug. 2003.[4] N. El Karoui et al. , “High-dimensionality effects in the Markowitzproblem and other quadratic programs with linear constraints: Riskunderestimation,”
Ann. Statist. , vol. 38, no. 6, pp. 3487–3566, 2010.[5] N. E. Karoui, “On the realized risk of high-dimensional Markowitzportfolios,”
SIAM J. Finan. Math. , vol. 4, no. 1, pp. 737–783, 2013.[6] Z. Bai, H. Liu, and W.-K. Wong, “Enhancement of the applicability ofMarkowitz’s portfolio optimization by utilizing random matrix theory,”
Math. Finance , vol. 19, no. 4, pp. 639–667, 2009.[7] J.-P. Bouchaud and M. Potters, “Financial applications of random matrixtheory: A short review,” in
Chapter 40 of The Oxford Handbook ofRandom Matrix Theory , G. Akemann, J. Baik, and P. D. Francesco,Eds. Oxford, U.K.: Oxford Univ. Press, 2011.[8] L. K. Chan, J. Karceski, and J. Lakonishok, “On portfolio optimization:Forecasting covariances and choosing the risk model,”
Rev. Finan.Studies , vol. 12, no. 5, pp. 937–974, 1999.[9] J. Fan, Y. Fan, and J. Lv, “High dimensional covariance matrix estima-tion using a factor model,”
J. Econometrics , vol. 147, no. 1, pp. 186–197,Nov. 2008.[10] O. Ledoit and M. Wolf, “Improved estimation of the covariance matrixof stock returns with an application to portfolio selection,”
J. Empir.Finance , vol. 10, no. 5, pp. 603–621, Dec. 2003.[11] ——, “A well-conditioned estimator for large-dimensional covariancematrices,”
J. Multivar. Anal. , vol. 88, no. 2, pp. 365–411, Feb. 2004.[12] ——, “Nonlinear shrinkage of the covariance matrix for portfolioselection: Markowitz meets Goldilocks,”
Available at SSRN 2383361 ,2014.[13] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, “Noise dressing offinancial correlation matrices,”
Phys. Rev. Lett. , vol. 83, no. 7, p. 1467,Aug. 1999.[14] L. Laloux, P. Cizeau, M. Potters, and J.-P. Bouchaud, “Random matrixtheory and financial correlations,”
Internat. J. Theoret. Appl. Finance ,vol. 3, no. 3, pp. 391–397, July 2000.[15] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, T. Guhr,and H. E. Stanley, “Random matrix approach to cross correlations infinancial data,”
Phys. Rev. , vol. 65, no. 6, p. 066126, June 2002.[16] V. Dahirel, K. Shekhar, F. Pereyra, T. Miura, M. Artyomov, S. Talsania,T. M. Allen, M. Altfeld, M. Carrington, D. J. Irvine et al. , “Coordinatelinkage of HIV evolution reveals regions of immunological vulnerabil-ity,”
Proc. Natl. Acad. Sci. USA , vol. 108, no. 28, pp. 11 530–11 535,July 2011.[17] A. Quadeer, R. Louie, K. Shekhar, A. Chakraborty, I. Hsing, andM. McKay, “Statistical linkage of mutations in the non-structural pro-teins of Hepatitis C virus exposes targets for immunogen design,”
J.Virol. , vol. 88, no. 13, pp. 7628–7644, Apr. 2014.[18] D. L. Donoho, M. Gavish, and I. M. Johnstone, “Optimal shrink-age of eigenvalues in the spiked covariance model,” arXiv preprintarXiv:1311.0851 , 2013.[19] J. Fan, J. Zhang, and K. Yu, “Vast portfolio selection with gross-exposureconstraints,”
J. Amer. Statist. Assoc. , vol. 107, no. 498, pp. 592–606,2012.[20] V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal, “A generalizedapproach to portfolio optimization: Improving performance by constrain-ing portfolio norms,”
Manage. Sci. , vol. 55, no. 5, pp. 798–812, Mar.2009. [21] R. Cont, “Empirical properties of asset returns: Stylized facts andstatistical issues,”
Quant. Finance , vol. 1, no. 2, pp. 223–236, 2001.[22] P. J. Huber, “Robust estimation of a location parameter,”
Ann. Math.Statist. , vol. 35, no. 1, pp. 73–101, 1964.[23] D. E. Tyler, “A distribution-free M-estimator of multivariate scatter,”
Ann. Statist. , vol. 15, no. 1, pp. 234–251, 1987.[24] R. A. Maronna, “Robust M-estimators of multivariate location andscatter,”
Ann. Statist. , vol. 4, no. 1, pp. 51–67, 1976.[25] J. T. Kent and D. E. Tyler, “Redescending M-estimates of multivariatelocation and scatter,”
Ann. Statist. , vol. 19, no. 4, pp. 2102–2119, 1991.[26] R. Couillet, F. Pascal, and J. W. Silverstein, “Robust estimates ofcovariance matrices in the large dimensional regime,”
IEEE Trans. Inf.Theory , vol. 60, no. 11, pp. 7269–7278, Sept. 2014.[27] ——, “The random matrix regime of Maronna’s M-estimator withelliptically distributed samples,” arXiv preprint arXiv:1311.7034 , 2013.[28] T. Zhang, X. Cheng, and A. Singer, “Marchenko-Pastur law for Tyler’sand Maronna’s M-estimators,” arXiv preprint arXiv:1401.3424 , 2014.[29] F. Pascal, Y. Chitour, and Y. Quek, “Generalized robust shrinkageestimator and its application to STAP detection problem,”
IEEE Trans.Signal Process. , vol. 62, no. 21, pp. 5640–5651, Nov. 2014.[30] Y. Abramovich and N. K. Spencer, “Diagonally loaded normalisedsample matrix inversion (LNSMI) for outlier-resistant adaptive filtering,”in
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) ,vol. 3, Honolulu, HI, Apr. 2007, pp. 1105–1108.[31] Y. Chen, A. Wiesel, and A. O. Hero, “Robust shrinkage estimation ofhigh-dimensional covariance matrices,”
IEEE Trans. Signal Process. ,vol. 59, no. 9, pp. 4097–4107, Sept. 2011.[32] R. Couillet and M. R. McKay, “Large dimensional analysis and op-timization of robust shrinkage covariance matrix estimators,”
J. Mult.Anal. , vol. 131, pp. 99–120, 2014.[33] F. Rubio, X. Mestre, and D. P. Palomar, “Performance analysis andoptimal selection of large minimum variance portfolios under estimationrisk,”
IEEE J. Sel. Topics Signal Process. , vol. 6, no. 4, pp. 337–350,Aug. 2012.[34] V. DeMiguel, L. Garlappi, and R. Uppal, “Optimal versus naive diversi-fication: How inefficient is the 1/n portfolio strategy?”
Rev. Finan. Stud. ,vol. 22, no. 5, pp. 1915–1953, 2009.[35] A. C. MacKinlay and L. Pastor, “Asset pricing models: Implicationsfor expected returns and portfolio selection,”
Rev. Finan. Stud. , vol. 13,no. 4, pp. 883–916, 2000.[36] O. Ledoit and M. Wolf, “Robust performance hypothesis testing withthe Sharpe ratio,”
J, Empir. Finance , vol. 15, no. 5, pp. 850–859, Dec.2008.[37] ——, “Robust performances hypothesis testing with the variance,”
Wilmott , vol. 2011, no. 55, pp. 86–89, 2011.[38] D. N. Politis and J. P. Romano, “The stationary bootstrap,”
J. Amer.Statist. Assoc. , vol. 89, no. 428, pp. 1303–1313, 1994.[39] D. Morales-Jimenez, R. Couillet, and M. R. McKay, “Large dimensionalanalysis of robust M-estimators of covariance with outliers,” arXivpreprint arXiv:1503.01245 , 2015.[40] P. Jorion, “Bayesian and CAPM estimators of the means: Implicationsfor portfolio selection,”
J. Banking Finance , vol. 15, no. 3, pp. 717–727,June 1991.[41] V. K. Chopra and W. T. Ziemba, “The effect of errors in means,variances, and covariances on optimal portfolio choice,”
J. PortfolioManagement , vol. 19, no. 2, pp. 6–11, Dec. 1993.[42] Z. Bai and J. W. Silverstein, “No eigenvalues outside the support ofthe limiting spectral distribution of large-dimensional sample covariancematrices,”
Ann. Probab. , vol. 26, no. 1, pp. 316–345, 1998.[43] L. Ahlfors,
Complex Analysis: An Introductin to The Theory of AnalyticFunctions of One Complex Variable . New York: McGraw-Hill, 1978.[44] Z. Bai and J. W. Silverstein,