Non-asymptotic rates for the estimation of risk measures
aa r X i v : . [ q -f i n . R M ] M a r NON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISKMEASURES
DANIEL BARTL AND LUDOVIC TANGPI
Abstract.
Consider the problem of computing the riskiness ρ ( F ( S )) of afinancial position F written on the underlying S with respect to a general lawinvariant risk measure ρ ; for instance, ρ can be the average value at risk. Inpractice the true distribution of S is typically unknown and one needs to resortto historical data for the computation. In this article we investigate rates ofconvergence of ρ ( F ( S N )) to ρ ( F ( S )), where S N is distributed as the empiricalmeasure of S with N observations. We provide (sharp) non-asymptotic ratesfor both the deviation probability and the expectation of the estimation error.Our framework further allows for hedging, and the convergence rates we obtaindepend neither on the dimension of the underlying stocks nor on the numberof options available for trading. Introduction and Main results
Risk is a pervasive aspect of the financial industry as every single financial de-cision carries a certain amount of risk. Correctly quantifying riskiness is thereforeof central importance for financial institutions. To this aim, a rigorous axiomaticapproach to risk management was initiated by Artzner et. al. [3] and matured intoan impressive theory of risk measures . We refer the unfamiliar reader to Definition1.4. Prime examples of risk measures are the average value at risk of Rockafel-lar and Uryasev [41] the optimized certainty equialents of Ben-Tal and Teboulle[6, 7], or the shortfall risk of F¨ollmer and Schied [21]. In this paper we discuss theestimation of risk measures.Let us first consider the case of plain risk measures, without trading or opti-mization issues. Denote the underlying by S and by µ its distribution, that is, µ is a probability measure on some measurable space X and S is a random variabledistributed according to µ . Given a financial position F : X → R written on S,the task is to compute ρ µ ( F ) := ρ ( F ( S )) where ρ is a law invariant convex riskmeasure . In practice however, the true distribution µ is unavailable, and one of-ten resort to historical data. This means that instead of ρ µ ( F ) one computes the(plug-in) estimator ρ µ N ( F ), where µ N is the empirical measure built from N i.i.d.historical observations of the underlying S . As we will soon observe, while thisestimator is consistent , it typically underestimates the true risk ρ µ ( F ). Thus anessential question for risk managers is:How far is ρ µ N ( F ) from ρ µ ( F ) for a fixed sample size N ? Date : March 25, 2020.2010
Mathematics Subject Classification.
Key words and phrases.
Risk measure, estimation, non-asymptotic rates, portfolio optimiza-tion, empirical processes.
To make this question rigorous, one of course needs to specify what ‘ far ’ means asthe estimation error | ρ µ N ( F ) − ρ µ ( F ) | is random (it depends on the observationsfrom S ). The goal of this article is to answer the above question by providing (sharp) non-asymptotic rates on the expected estimation error and on the probability thatthe estimation error exceeds some prescribed threshold.Before presenting our main results, let us generalize the discussion to the morepractically relevant situation where hedging is also possible. In fact, we can ad-ditionally consider options G , . . . , G e : X → R available for trading at prices p , . . . , p e ∈ R , respectively (where e ∈ N ). Trading according to a strategy g ∈ R e then yields the outcome F + P ei =1 g i ( G i − p i ) so that the risk manager’s task is toestimate the minimal risk incurred when trading in the option market, that is, tocompute π µ ( F ) := inf g ∈G ρ µ (cid:16) F + e X i =1 g i ( G i − p i ) (cid:17) , where G ⊂ R e is the set of all admissible trading strategies. Loosely speaking, thegoal here is to absorb extreme outcomes of F by trading. For instance, G = { g ∈ [0 , e : g + · · · + g e = 1 } corresponds to portfolio optimization ; see [42] for somebackground. Notice that if 0 is the only admissible trading strategy, i.e. G = { } ,then we have π µ = ρ µ and hence all results obtained for π translate to ρ .1.1. Results for AVaR, OCE, and SF risk measures.
While the mathematicalchallenges of the present article lie in the treatment of general risk measures, westart with an easy-to-state result for two specific and widely used risk measures.For any F : X → R , the shortfall risk measure [22] is defined asSF µ ( F ) := inf n m ∈ R : E µ [ l ( F ( S ) − m )] ≤ o . Here E µ [ · ] denotes the expectation under which S ∼ µ and l : R → R + is a lossfunction, meaning that l is increasing and convex such that 1 ∈ ∂l (0) (the subdiffer-ential at point 0). In other words, SF µ ( F ) is the smallest capital m needed to reducethe loss F to make it acceptable, meaning that the expected loss E µ [ l ( F ( S ) − m )]is below the threshold 1.In a similar spirit, the optimized certainty equivalent (OCE) is defined viaOCE µ ( F ) := inf m ∈ R ( E µ [ l ( F ( S ) − m )] + m ), see [7, 6]. Again l is a loss func-tion and the interpretation is similar to that of shortfall risk. Importantly, OCEscover popular risk measures such as the average value at risk or the entropic riskmeasure, see (2.1) below for the OCE representation of average value at risk.For the rest of this introduction we assume that F, G , . . . , G e : X → R arebounded measurable functions of the underlying and that G ⊂ R e is a bounded set. Theorem 1.1 (Rates for AVaR, OCE, and SF) . Let ρ = OCE or ρ = SF and inthe latter case assume that l is strictly increasing. There are constants c, C > such that the following hold.(i) We have the moment bound E h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) (cid:12)(cid:12)i ≤ C √ N for all N ≥ . ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 3 (ii) We have the matching deviation inequality P h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) (cid:12)(cid:12) ≥ ε i ≤ C exp (cid:16) − cN ε (cid:17) for all N ≥ and all ε > . The constants c and C depend on l , the maximal range of F , G , the number ofoptions e and the diameter of G . Three remarks are in order. Remark 1.2. (a) The rates obtained in both parts of Theorem 1.1 are the usual rates dictatedby the central limit theorem and in particular optimal, see Section 5.(b) While boundedness of F and G can be relaxed to some extend (see Theorem3.1), the boundedness requirement on G is necessary. Indeed, in Proposition6.1 we will show convergence of π µ N ( F ) to π µ ( F ) (at any rate) already impliesthat G is bounded.(c) An important observation is that throughout this paper the rates will neverdepend on the number e of options, nor on the ‘dimension’ of the underlyingspace X . In addition F and G , . . . , G e are not subject to any continuitycondition.One could wonder whether, at least if X = R d and F, G are Lipschitz continu-ous, the statements of Theorem 1.1 would follow from some rather simple to obtaincontinuity in Wasserstein distance of µ ρ µ ( F ) in combination with convergencerates of empirical measure in Wasserstein distance. While this technique certainlyworks for dimension d = 1, in the present general, multidimensional setting thisapproach would force the convergence rates to be significantly worse: In dimen-sion d ≥
3, the Wasserstein distance converges with rate N − /d instead of N − / obtained above, see [23].Before discussing the generalization of Theorem 1.1 beyond OCE and SF riskmeasures, let us present a few statistical properties of the estimator. First of all,it follows as a direct consequence of the Borel-Cantelli lemma that part (ii) ofTheorem 1.1 implies the following strong consistency property: Corollary 1.3 (Consistency) . In the setting of Theorem 1.1 we have that lim N →∞ π µ N ( F ) = π µ ( F ) P -almost surely. It is clear that if F and G are bounded continuous functions, the claim of Corol-lary 1.3 is a consequence of weak convergence of the empirical measure to the trueone. Recall however that here we merely assumed F and G to be measurable.Despite the above strong consistency property, the estimator is typically biased as alluded to above. In fact ρ µ N ( F ) often underestimates ρ ( F ). This is seen theeasiest in the case of the optimized certainty equivalent: taking the infimum over m in its definition outside the expectation E [ · ] shows that E (cid:2) OCE µ N ( F ) (cid:3) ≤ inf m ∈ R E h E µ N [ l ( F ( S ) − m ) + m ] i = OCE µ ( F ) . The same applies in presence of trading, namely E [ π µ N ( F )] ≤ π µ ( F ). More gen-erally, a quick inspection of OCE and SF reveals that both are concave consid-ered as mappings of µ . Now, more general as above, for every concave mapping DANIEL BARTL AND LUDOVIC TANGPI µ ρ µ ( F ), applying Jensen’s inequality formally as in the real-valued case yields E [ ρ µ N ( F )] ≤ ρ E [ µ N ] ( F ) = ρ µ ( F ) (where we used E [ µ N ] = µ ). As a matter of fact,while not all general law invariant risk measures are concave in µ , this is often thecase ; see Acciaio and Svindland [1]. In particular, the above discussion also appliesto general risk measures presented in the next section, and we refer to Pitera andSchmidt [40] for further discussion on the issue of biasedness and some empiricalevidence.1.2. Results for general risk measures.
It is natural to ask whether Theorem1.1 extends to general risk measures. Let us recall the definition for completeness. Definition 1.4 (Risk measure) . A functional ρ : L ∞ → R over a standard proba-bility space is a law invariant (convex) risk measure if(a) ρ ( X + m ) = ρ ( X ) + m and ρ (0) = 0 for all X and m ∈ R ,(b) ρ ( X ) ≤ ρ ( Y ) if X ≤ Y almost surely,(c) ρ ( λX + (1 − λ ) Y ) ≤ λρ ( X ) + (1 − λ ) ρ ( Y ) for λ ∈ [0 , ρ ( X ) = ρ ( Y ) if X ∼ Y , that is, if X and Y have the same distribution.As above, for a bounded function F : X → R and a probability µ on X , we write ρ µ ( F ) := ρ ( F ( S )) where S ∼ µ .In addition to the properties (a)-(d) stated above, it is customary to assume thatrisk measures satisfy some regularity condition. Definition 1.5.
Let ρ : L ∞ → R be a risk measure and let X, X n ∈ L ∞ such thatsup n k X n k ∞ < ∞ and X = lim n X n almost surely. Then ρ is said to have the(a) Fatou property , if ρ ( X ) ≤ lim inf n ρ ( X n );(b) Lebesgue property , if ρ ( X ) = lim n ρ ( X n ).Recall that by a result of Jouini, Schachermayer and Touzi [30], every law invari-ant risk measure automatically satisfies the Fatou property. Perhaps surprisingly,our first (negative) result states that Theorem 1.1 cannot be extended to generalrisk measures solely under the Lebesgue property; namely convergence can happenat an arbitrarily slow rate: Proposition 1.6 (No rates in general) . Assume that X is not a singleton. Thenthere exists a (sublinear) law invariant risk measure ρ : L ∞ → R which satisfies theLebesgue property and a bounded function F : X → R such thatthere is no rate ε > such that E h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) (cid:12)(cid:12)i ≤ CN ε for all N ≥ and all probability measures µ (which are supported on two fixeddistinct points in X ). In view of the above negative result, the next step is to identify what causes thelack of convergence rates and to come up with a (hopefully) natural and easy-to-check (regularity) property for ρ which guarantees convergence at some prescribedrate. For instance, this is always true for law-invariant comonotonic risk measure, see [1, Corollary10] Observe that in contrast to the original definition [3, 22], we take risk measures to be in-creasing. This is done for notational convenience and does not affect generality. If X consist of a single point, then there is only one probability measure µ on X (assigningmass 1 to the single point in X ). In particular µ N = µ and thus π µ N ( F ) = π µ ( F ). ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 5
Interestingly, all it takes is for ρ to be finite for certain non-bounded randomvariables: roughly speaking, the more a risk measures behaves like ρ max ( X ) :=ess . sup X , the worse the rates of convergence. This is due to the fact that changesof X on almost negligible sets can result in significant changes of ρ max ( X ) andalmost negligible events will not be exhibited properly by the sample. On theother extreme of the spectrum, the more a risk measure behaves like ρ min := E [ X ],the better the rates as ρ min as small changes of X will result in small changes of ρ min ( X ).For these two mentioned (extreme) examples, one clearly has that ρ max ( | X | ) :=sup n ρ max ( | X | ∧ n ) is finite if and only if X is bounded, while ρ min ( | X | ) is finitefor all integrable X . We have just explained that ρ max does not allow for anyconvergence rates, while ρ min allows for the usual 1 / √ N rates. Using convexityand monotonicity, we will be able to interpolate the above extreme cases in thesense that, roughly speaking, if an arbitrary risk measure ρ takes finite values forrandom variables with finite q -th moments, ρ is regular in the sense that the ratesof convergence are of order N − / q .To make these observations rigorous, we need one more definition, discussedafter the proceeding theorem: a random variable X is said to have finite weak q -thmoment if there is some C ≥ P [ | X | ≥ t ] ≤ Ct − q for all t ≥ Theorem 1.7 (Rates for general risk measures) . Let q ∈ (1 , ∞ ) and let ρ : L ∞ → R be a law invariant risk measure taking finite values for random variables with finiteweak q -th moment. Then there are constants c, C > such that the following hold.(i) We have the moment bound E h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) (cid:12)(cid:12)i ≤ CN / q for all N ≥ .(ii) We have the matching deviation inequality P h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) (cid:12)(cid:12) ≥ ε i ≤ C exp (cid:16) − cN ε q (cid:17) for all N ≥ and all ε > . To make our assumption more tractable, recall that E [ | X | q ] = q R t q − P [ | X | ≥ t ] dt . Therefore if X has finite q -th moment, then it has finite weak q -th moment;and if X has finite weak q -th moment, then it has finite ( q − ε )-th moment for all ε >
0. In particular, the assumption in the above theorem is satisfied whenever ρ ( | X | ) < ∞ for all X with finite ( q − ε )-th moment.Note that F and G are (again) merely measurable and the rate depends solelyon the assumption made on ρ and not on the dimension or the number of op-tions traded. Moreover, the Borel-Cantelli lemma again implies that π µ N ( F ) is aconsistent estimator: Corollary 1.8.
In the setting of Theorem 1.7 we have that lim N →∞ π µ N ( F ) = π µ ( F ) P -almost surely. We write ∧ for the minimum and ∨ for the maximum. By this we mean that ρ ( | X | ) := sup n ρ ( | X | ∧ n ) < ∞ for all random variables X with finiteweak q -th moment. DANIEL BARTL AND LUDOVIC TANGPI
For values q ≈
1, the rates obtained in Theorem 1.7 almost coincide with the ratesobtained in Theorem 1.1, which are the optimal (standard) rates when investigatingi.i.d. phenomena. On the other hand, as q increases, the rates get worse and for q = ∞ , Proposition 1.6 tells us that no (polynomial) rates are available at all. Thelatter is in line with Theorem 1.7 and naturally triggers the question whether theresults of Theorem 1.7 are optimal for all values of q , which is part of the nextresult. Proposition 1.9 (Sharpness) . Assume that X is not as singleton. Then, for every q ∈ [1 , ∞ ) , there exists a law invariant risk measure ρ : L ∞ → R taking finite valuesfor random variables with finite q -th moment and a constant c > such that: Forall (large) N ≥ there is a probability µ (supported on two distinct points of X )satisfying E [ | π µ ( F ) − π µ N ( F ) | ] ≥ cN − /q . In other words, the rate(s) obtained in Theorem 1.7 are sharp up to a factorof two. Currently, the authors do not know whether the rates of Theorem 1.7 areactually sharp (without the factor two). One indication that this might be true isthe following (explained in more details in Remark 5.3): For q ≈ /N and we already know that the actual bestpossible rates are 1 / √ N , that is, for q ≈ Utility maximization.
It is conceivable that most of the results and methodsof the present article extend beyond the estimation of risk measures. Other issueswhich seem to fit to our framework and method include the estimation of riskpremium principles in insurance, (see e.g. Young [46] or Furman and Zitikis [24] foran overview), or estimation of the value of some stochastic optimization problems.To illustrate the latter, let us consider another popular approach for quantifyingthe riskiness of a position, namely utility maximization: Let U : R → R be a concaveincreasing function and set u µ ( F ) := E µ [ U ( F ( S ))]. Similar as before, allowing theagent to invest in a market, one obtains the utility maximization problem u µ max ( F ) := sup g ∈G u µ (cid:16) F + e X i =1 g i G i (cid:17) . In this case, we have the following:
Proposition 1.10 (Utility maximization) . There are constants c, C > such that E h(cid:12)(cid:12) u µ max ( F ) − u µ N max ( F ) (cid:12)(cid:12)i ≤ C √ N ,P h(cid:12)(cid:12) u µ max ( F ) − u µ N max ( F ) (cid:12)(cid:12) ≥ ε i ≤ C exp (cid:16) − cN ε (cid:17) for all N ≥ and ε > . Again note that the rates are optimal and do not depend on the dimension ofthe underlying nor the number e of available options and that u µ N max ( F ) is a stronglyconsistent estimator which typically overestimates its true value (as we deal withmaximization instead of minimization this time). ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 7
Related literature.
The estimation of risk measures is an essential questionin quantitative finance, and as such has received a lot of attention, we refer forinstance to the monograph of McNeil, Embrechts and Frey [2] for an in-depthtreatment. See also the book of Glasserman [27, Chapter 9] for the case of (average)value-at-risk. In mathematical finance, there is a growing interest on statisticalaspects of quantitative risk management. We refer to Embrechts and Hofert [18]for an excellent review of the main lines of research in this direction. Concerningstatistical estimation of risk measures, one of the earliest work is that of Weber [45]who considered the problem of estimating ρ µ ( F ) (without trading) in an asymptoticfashion as N → ∞ . By means of the theory of large deviations, he showed thatif ρ is sufficiently regular, then ρ µ N ( F ) satisfies a large deviation principle. Alongthe same lines, [9, 5, 12] obtained central limit theorems for ρ µ N ( F ); see also [42,Chapter 6].This is a good place to highlight the difference between asymptotic rates and non-asymptotic rates and estimations: while there are instances where the asymptoticrates suggest a much faster convergence, this is only true within the asymptoticregime. In other words, no matter how large the sample size N , asymptotic ratesgive no indication about how close ρ µ N ( F ) is to ρ µ ( F ). Non-asymptotic rateshowever hold for every N , and give an order of magnitude of the sample size neededto achieve a desired estimation accuracy.Aside from large deviation and central limit theorem results, some authors inves-tigate estimation of specific risk measures and (super)hedging functionals. Theseinclude Pal [37, 38] who analyzes hedging under risk measures which can be writ-ten as the finite maxima of expectations. Let us further refer to [47, 10, 40] forother (asymptotic) estimation results, mostly for the average value at risk and un-der some assumptions on the distribution µ ; see e.g. Hong, Hu and Liu [28] for areview. A deviation-type inequality for the value at risk is proposed by Jin, Fu andXiong [29]. The problem of strict superhedging was recently considered by Ob l´ojand Wiesel [36]; this problem depends solely on the (topological) support of theunderlying measure and therefore no rates are available in general.When the estimation of ρ µ ( F ) is performed repeatedly or periodically, it is im-portant that the estimator ρ µ N ( F ) be stable, i.e. insensitive to small changes of µ N . Such insensitivity is often referred to as robustness of the risk measure andwas first analyzed by Cont, Deguest and Scandolo [15] who investigated a conceptof robustness essentially equivalent to continuity of ρ w.r.t. weak convergence ofmeasures. Alternative approaches to robustness were later proposed and analyzedby Kr¨atschmer, Schied and Z¨ahle [34, 33] and Cambou and Filipovi´c [11]. Alongthe same lines some authors have investigated risk measures (and other stochasticmaximization problems) under model uncertainty to account for the effect of pos-sible misspecification of the estimated model, see e.g. [4, 19, 17, 20, 26] where itis often assumed that the true model belongs to a Wasserstein ball. At this point,we should also mention Pichler [39] who studies the continuity of risk measures(in Wasserstein distance). Related but with a different agenda, Cheridito and Li[13, 14] characterize conditions under which risk measures on Orlitz spaces takefinite values measures [13, Theorem 6.9]. The latter, as we shall see through thelater proofs, also reflects on robustness.Beyond estimation of risk measures, a rich literature in operations research isdevoted to the estimation of the value of stochastic optimization problems similar DANIEL BARTL AND LUDOVIC TANGPI to OCE through the empirical distribution of the underlying probability measure.This technique goes under the name sample average approximation. The bulk ofthe literature in this direction is concerned with convergence issues and questionsrelated to computational complexity of the estimators, see e.g. [32, 8, 44] and thebook chapter [31] for a recent overview.Somewhat related to this article, the recent years brought up a number of articlesinvestigating non-asymptotic convergence rates of empirical measure in Wassersteindistance, see e.g. [23] and reference therein. However, as already mentioned, thisapproach would be restricted to the case X = R d , would requires strong continuityconditions on F and G , and most importantly yields suboptimal rates.1.5. Organization of the rest of the paper.
We start by defining our basicnotation and by proving a generalization to unbounded
F, G of Theorem 1.7 part(i) on mean speed convergence in Section 2 for the case of the optimized certaintyequivalent. Section 3 is the main part of this paper and deals with the proof of(the generalization to unbounded
F, G of) Theorem 1.7 part (i). The deviationinequalities in Theorem 1.1 and Theorem 1.7 (that is, parts (ii) thereof) are provenin Section 4. Finally, sharpness of the rates for general risk measures is discussedin Section 5 and all remaining proofs are presented in Section 6.2.
Rates for average value at risk and optimized certaintyequivalents
Let us briefly fix our notation: Throughout this paper we make the importantconvention that
C > C may depend onall kind of parameters (such as some L p norms of F and G , features of the riskmeasure such as growth of the loss function l in the OCE/SF case,...) but noton N . Moreover, the value of C is allowed increase from line to line, for instancesup y ( xy − y ) = Cx ≤ Cx / C √ e + 1 ≤ C √ e for all e ∈ N , but not N ≤ C or √ e + 1 ≤ √ e/C .For a metric space ( S, d ) and ε >
0, denote by N ( S, d, ε ) the covering numbersat scale ε , that is, N ( S, d, ε ) is the smallest number for which there is a subset ˜ S with that cardinality satisfying: For every s ∈ S there is ˜ s ∈ ˜ S with d ( s, ˜ s ) ≤ ε .In other words, N ( S, d, ε ) the smallest number of balls of radius ε which covers S .The latter suggest this to be some measurement of compactness, and in fact, it isan important tool in understanding the behavior of empirical processes, see [43].Recall that e ∈ N is a fixed number and F, G , . . . , G e : X → R are positionswritten on S . For shorthand notation, write g · G := P ei =1 g i G i for g ∈ G and | G | := P ei =1 | G i | . Recall that, throughout this article, the set G ⊂ R e is assumedto be bounded. The necessity of this assumption is shown in Proposition 6.1.The average value at risk also goes under several different names such as expectedshortfall, conditional value at risk, and expected tail loss, and has equally manydifferent (equivalent) definitions, for instances as the value at risk integrated overdifferent levels; see [22, Section 4.3] for an overview. As we shall treat the averagevalue at risk as a special case of the optimized certainty equivalents, the followingdefinition / representationAVaR u ( X ) := inf m ∈ R E h − u ( X − m ) + + m i (2.1) ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 9 seems best suited. Here u ∈ [0 ,
1) is called the risk aversion parameter. From (2.1)it is clear that the average value at risk is a special case of the optimized certaintyequivalent, recalled for the convenience of the reader:OCE( X ) := inf m ∈ R E [ l ( X − m ) + m ] , where l : R → R + is a convex increasing function with 1 ∈ ∂l (0). We additionallyassume that lim inf x →∞ l ( x ) /x >
1, which by convexity and 1 ∈ ∂l (0) is equivalentto the fact that l ( x ) > x for some x ≥
0. This assumption is there because F and G are possibly not bounded (in contrast to the introduction), but not needed if thisis the case. We shall often work under the assumption that l ′ (the right continuousderivative of the convex function l ) has polynomial growth of degree p −
1, whichmeans that l ′ ( x ) ≤ C (1 + | x | p − ) for all x ∈ R (with the convention | x | p − = ∞ for p = ∞ ). Note that in case p = ∞ this is no restriction at all. For instance, theexponential function l = exp satisfies this assumption (only) for p = ∞ .The goal of this section is to prove Theorem 1.1 part (i), or rather the followinggeneralization thereof. Theorem 2.1.
Let p ∈ [1 , ∞ ] , assume that l ′ has polynomial growth of degree p − ,and that k F k L p ( µ ) and k G k L p ( µ ) are finite. Then E h sup g ∈G (cid:12)(cid:12) OCE µ ( F + g · G ) − OCE µ N ( F + g · G ) (cid:12)(cid:12)i ≤ C √ N for all N ≥ . The constant C depends on µ only through the size of the above L p ( µ ) -norms of F and G , on e , on p , and on the diameter of G . We now turn to the proof of Theorem 2.1. In fact, looking at the definition ofthe optimized certainty equivalent, the reader familiar with the theory of empiricalprocesses recognizes this as a standard problem covered within this theory. Thus,at some point, an estimate of the covering numbers with respect to the random L ( µ N ) norm must be computed. Fortunately, no geometric arguments are needed,and all randomness can be controlled by some estimates involving moments only.For this reason it will be useful to keep track of the following quantities J := 1 + | F | + | G | and M := k J k L p ( µ ) and M N := k J k L p ( µ N ) . (2.2)The first result in this spirit is Lemma 2.2.
Assume that l ′ has polynomial growth of degree p − . Then we havethat | OCE µ ( F + g · G ) | ≤ CM p and (2.3) OCE µ ( F + g · G ) = inf | m |≤ CM p Z X l ( F ( x ) + g · G ( x ) − m ) + m µ ( dx )(2.4) for every g ∈ G . The same holds true if the pair µ, M is replaced by µ N , M N .Proof. We only focus on µ, M , the proof for µ N , M N works analogously. Assumewithout loss of generality that M < ∞ , otherwise there is nothing to show.As l is increasing and of polynomial growth with degree p and G is bounded, wehave that sup g ∈G l ( F + g · G ) ≤ ( CJ p if p < ∞ ,C if p = ∞ . In particular, the choice m = 0 (in the definition of OCE) and the fact that l ≥ µ ( F + g · G ) ≤ Z X l ( F ( x ) + g · G ( x )) µ ( dx ) ≤ CM p for all g ∈ G , showing the upper bound in (2.3). Further, as l ≥
0, this also impliesthat the infimum over m in the definition of OCE µ ( F + g · G ) can be restricted to m ≤ CM p for all g ∈ G .On the other hand, by convexity of l and the assumption that lim inf x →∞ l ( x ) /x >
1, there exist a > b ∈ R such that l ( x ) ≥ ax − b for every x ∈ R . This implies Z X l ( F ( x ) + g · G ( x ) − m ) + m µ ( dx ) ≥ Z X a (cid:0) − CJ ( x ) − m (cid:1) − b + m µ ( dx ) ≥ m (1 − a ) − CM p , (2.5)where we used that R X J dµ ≤ M ≤ M p which follows from H¨older’s inequalityand as M ≥
1. By the previous part we already know that OCE µ ( F + g · G ) ≤ CM p for all g ∈ G . Together with (2.5) this implies that the infimum over m inOCE µ ( F + g · G ) can be restricted to m ≥ − CM p for all g ∈ G . In turn, using oncemore that l ≥
0, this also implies that OCE µ ( F + g · G ) ≥ − CM p for all g ∈ G andthus completes the proof. (cid:3) Lemma 2.3.
Assume that l ′ has polynomial growth of degree p − , let m ∈ R ,and define H := n l ( F + g · G − m ) + m : g ∈ G and m ∈ [ − m , m ] o . Then, for every ε > , we have that N ( H , k · k L ( µ N ) , ε ) ≤ (cid:16) C k J k pL p ( µ N ) ε (cid:17) e +1 ∨ if p < ∞ ; and N ( H , k · k L ( µ N ) , ε ) ≤ ( C/ε ) e +1 ∨ if p = ∞ .Proof. Without loss of generality, we work only on the set where k J k L p ( µ N ) < ∞ (otherwise there is nothing to show). We proceed in two steps.(a) Pick two elements H, ˜ H ∈ H represented as H = l ( F + g · G − m ) + m and˜ H = l ( F + ˜ g · G − ˜ m ) + ˜ m and define the family of functions ( ϕ t ) t ∈ [0 , from X to R by ϕ t := F + g · G − m + t ((˜ g − g ) · G + m − ˜ m )for every t ∈ [0 , H = l ( ϕ ) + m and ˜ H = l ( ϕ ) + ˜ m . As G is bounded, | ϕ t | ≤ CJ for all t ∈ [0 , l , its right derivative l ′ is increasing.By the fundamental theorem of calculus, we have k H − ˜ H k L ( µ N ) ≤ (cid:13)(cid:13)(cid:13) Z l ′ ( ϕ t ) ϕ ′ t dt (cid:13)(cid:13)(cid:13) L ( µ N ) + | m − ˜ m |≤ (cid:13)(cid:13) l ′ ( CJ ) (cid:0) (˜ g − g ) · G + m − ˜ m (cid:1)(cid:13)(cid:13) L ( µ N ) + | m − ˜ m | . ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 11
Now note that k l ′ ( CJ ) J k L ( µ N ) ≤ ( C k J k pL p ( µ N ) if p < ∞ ,C if p = ∞ . Indeed, for p < ∞ this follows from the assumption that l ′ ( x ) ≤ C (1 + | x | p − )for all x ∈ R , and the fact that J ≥
1. For p = ∞ , one has by assumption that J is µ -almost surely bounded. Hence, P -almost surely, J is also µ N -almostsurely bounded (by the same constant). As l is bounded on bounded sets (byconvexity), this implies that l ′ ( J ) is µ N -almost surely bounded.To conclude, we use once more that G is bounded and hence | (˜ g − g ) · G | ≤| ˜ g − g | J . Therefore k H − ˜ H k L ( µ N ) ≤ ( C k J k pL p ( µ N ) ( | g − ˜ g | + | m − ˜ m | ) if p < ∞ ,C ( | g − ˜ g | + | m − ˜ m | ) if p = ∞ . (2.6)In the following we restrict to p < ∞ and leave the obvious change to thereader.(b) Fix ε > A ⊂ R be such thatfor all m ∈ [ − m , m ] there is ˜ m ∈ A with | m − ˜ m | ≤ ε C k J k pL p ( µ N ) and B ⊂ R e such thatfor all g ∈ G there is ˜ g ∈ B with | g − ˜ g | ≤ ε C k J k pL p ( µ N ) . Then, if we define ˜ H exactly as H only with [ − m , m ] replaced by A and G replaced by B , by (2.6), for every H ∈ H there is ˜ H ∈ ˜ H with k H − ˜ H k L ( µ N ) ≤ ε . This implies that N ( H , k · k L ( µ N ) , ε ) ≤ card( ˜ H ) ≤ card( A × B ) = card( A ) card( B )where card means cardinality.The set A can be constructed simply by a equidistant partition of [ − m , m ]at cardinality card( A ) ≤ ( C k J k pL p ( µ N ) /ε ) ∨
1. In a similar manner, B can beconstructed with card( B ) ≤ ( C k J k pL p ( µ N ) /ε ) e ∨ (cid:3) Inspecting the proof actually yields the following result, which we state for laterreference.
Corollary 2.4.
Let m ∈ R , let f : R → R be locally Lipschitz continuous, andassume that J is bounded. Then it holds that N (cid:16)(cid:8) f ( F + g · G − m ) : g ∈ G and m ∈ [ − m , m ] (cid:9) , k · k ∞ , ε (cid:17) ≤ (cid:16) Cε (cid:17) e +1 ∨ for every ε > . We are now ready for the
Proof of Theorem 2.1.
For shorthand notation, set∆ N := sup g ∈G (cid:12)(cid:12)(cid:12) OCE µ ( F + g · G ) − OCE µ N ( F + g · G ) (cid:12)(cid:12)(cid:12) for every N ≥
1. With M and M N defined in (2.2), we write E [∆ N ] = E [∆ N M N ≤ M +1 ] + E [∆ N M N >M +1 ]and investigate both terms separately.(a) We start with the first term. Lemma 2.2 guarantees that∆ N M N ≤ M +1 ≤ sup H ∈H (cid:12)(cid:12)(cid:12) Z X H ( x ) ( µ − µ N )( dx ) (cid:12)(cid:12)(cid:12) for every N ≥
1, where H := { l ( F + g · G − m ) + m : g ∈ G and | m | ≤ C ( M + 1) p } . By the ‘empirical process version’ of Dudley’s entropy-integral theorem (see forinstance Corollary 2.2.8 and Lemma 2.3.1 in [43]) one has that E h sup H ∈H (cid:12)(cid:12)(cid:12) Z X H ( x ) ( µ − µ N )( dx ) (cid:12)(cid:12)(cid:12)i ≤ C √ N E h Z ∞ q log N ( H , k · k L ( µ N ) , ε ) dε i for all N ≥ p < ∞ . Then, estimating the covering numbers of H bymeans of Lemma 2.3 implies that E h Z ∞ q log N ( H , k · k L ( µ N ) , ε ) dε i ≤ CE h Z ∞ r log (cid:0) C k J k pL p ( µN ) ε ∨ (cid:1) dε i = CE h k J k pL p ( µ N ) Z ∞ q log (cid:0) ε ∨ (cid:1) d ˜ ε i , where the last inequality follows from substituting ε by ˜ ε := ε/C k J k pL p ( µ N ) .In a final step, notice that Z ∞ q log (cid:0) ε ∨ (cid:1) dε < ∞ and E [ k J k pL p ( µ N ) ] ≤ C k J k pL p ( µ ) . The second statement follows from Jensen’s inequality. Therefore E [∆ N M N ≤ M +1 ] ≤ C √ N for all N ≥
1, showing that the first term behaves as required. If p = ∞ thesame arguments apply (with k J k pL p ( µ N ) replaced by a constant) and we againobtain E [∆ N M N ≤ M +1 ] ≤ C/ √ N .(b) As for the second term, applying H¨older’s inequality yields E [∆ N M N >M +1 ] ≤ E [∆ N ] / P [ M N > M + 1] / . (2.7) ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 13
We start by estimating P [ M N > M + 1] / . For p = ∞ , one has P [ M N >M + 1] = 0 for all N . For p < ∞ , using first that M, M N ≥ P [ M N − M > ≤ P [ M pN − M p > ≤ E [( M pN − M p ) ] . Further, making use of the fact that the ( X n ) are independent with M p = E [ J ( X n ) p ] for all n , one has E [( M pN − M p ) ] = E h(cid:16) N N X n =1 (cid:0) J ( X n ) p − E [ J ( X n ) p ] (cid:1)(cid:17) i = 1 N E [( J ( X ) p − E [ J ( X ) p ]) ] ≤ k J k pL p ( µ ) N .
This shows that P [ M N > M + 1] / ≤ C/ √ N .Regarding E [∆ N ], use Lemma 2.2 to estimate E [∆ N ] ≤ C ( M p + E [ M pN ]) . The same arguments as above show that E [ M pN ] ≤ k J k pL p ( µ ) . Plugging bothestimates in (2.7) shows that E [∆ N M N >M +1 ] ≤ C √ N for all N ≥ E [∆ N ] ≤ C/ √ N for all N ≥
1. Thiscompletes the proof. (cid:3) General law invariant risk measures
This section deals with general risk measures, which we start by briefly de-scribing. First, in order to allow for unbounded F and G , one needs to definerisk measures for unbounded functions. A function ρ : L p → R with p ∈ [1 , ∞ ] isagain called (convex) law invariant risk measure if (a)-(d) of Definition 1.4 holdwith L ∞ replaced by L p . Further recall that ρ is called sublinear if in addition ρ ( λX ) = λρ ( X ) for all X ∈ L p and λ ≥ spectral representation ρ ( X ) = sup γ ∈M (cid:16) Z [0 , AVaR u ( X ) γ ( du ) − β ( γ ) (cid:17) for X ∈ L p , (3.1)see [25]. Here M is a subset of probability measures on [0 ,
1) armed with its Borel σ -field, β : M → [0 , ∞ ) is a convex function, and AVaR is the average value at It also goes under the name Kusuoka representation as the L ∞ -version was discovered byKusuoka [35]. risk, defined in (2.1). Note that AVaR is evidently a sublinear law invariant riskmeasure.Before we are ready to state the generalization of part (i) of Theorem 1.7, thetreatment of unbounded F, G requires one last definition: for every parameter p ∈ [1 , ∞ ] and x ≥ w p ( x ) := sup { ρ ( X ) : k X k L p ≤ x } . Note that w p is convex, nonnegative, and w p grows at least linearly. Theorem 3.1.
Let < q < p ≤ ∞ and let ρ : L p → R be a law invariant riskmeasures such that ρ ( | X | ) < ∞ whenever X has finite weak q -th moment. Assumethat G is bounded and that R w p ( t ( | F | + | G | ) p ) dµ < ∞ for every t ≥ when p < ∞ and that F and G are bounded when p = ∞ . Then E h sup g ∈G (cid:12)(cid:12) ρ µ ( F + g · G ) − ρ µ N ( F + g · G ) (cid:12)(cid:12)i ≤ CN /q − / p − /p for all N ≥ . Note that w p is linear if either p = ∞ or ρ is sublinear. In this case the inte-grability condition on F and G imposed in the above theorem simply means that k F k L p ( µ ) and k G k L p ( µ ) should be finite. To interpret the rates, note that thehigher the integrability of F and G , the better the rates. Similarly, more regularityof ρ is (i.e. a lower value of q ) will also improve the rates. For convenience, wecomputed some values q ≈ q = 2 q = p q ≈ p p = ∞ /q − / p − /p ≈ p − p − p − q Table 1.
Convergence rate for different values of p and q .The idea for the proof of Theorem 3.1 is the following: By Section 2 we under-stand the behavior of the mean error for the average value at risk (being a specialcase of the optimized certainty equivalents). By the spectral representation (3.1)they form the building block of every law invariant risk measure and we concludevia a (multiscale) approximation, keeping track of the risk aversion parameter u ofthe average value at risk (which will make all constants explode when approaching u ≈
1) and the growth of measures γ ( du ) in the spectral representation (3.1) (whichonly put little mass on u ≈ Lemma 3.2.
Let q ∈ (1 , ∞ ) and f X ( x ) = q [1 , ∞ ) ( x ) x − ( q +1) for x ∈ R be thedensity of the distribution of the random variable X . Then X has finite weak q -thmoment and AVaR u ( X ) = qq − − u ) /q for every u ∈ [0 , .Proof. Clearly P [ X ≥ t ] = t − q for t ≥ X has finite weak q -th moment bydefinition. Moreover, as m ( x − m ) + is Lebesgue almost surely differentiable, theoptimal m ∗ for AVaR u ( X ) (recall (2.1)) is characterized by the first order condition Z R − − u [ m ∗ , ∞ ) ( x ) + 1 F X ( dx ) = 0 . ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 15
A quick computation shows that Z ∞ a F X ( dx ) = a − q and Z ∞ a x F X ( dx ) = qa − q +1 q − a ≥
1. Thus the optimal m equals m ∗ = (1 − u ) − q . Therefore, the value ofAVaR u ( X ) equals Z R − u ( x − m ∗ ) + + m ∗ F X ( dx ) = 11 − u Z ∞ m ∗ x − m ∗ F X ( dx ) + m ∗ = 11 − u (cid:16) q ( m ∗ ) − q +1 q − − m ∗ ( m ∗ ) − q (cid:17) + m ∗ . Plugging in the value of the optimal m ∗ = (1 − u ) − q and simplifying the termsyields the claim. (cid:3) Lemma 3.3.
Let p ∈ (1 , ∞ ] and X ∈ L p . Then we have | AVaR u ( X ) | ≤ k X k L p (1 − u ) /p for every u ∈ [0 , .Proof. For notational simplicity assume first that X has a strictly increasing con-tinuous distribution function F X and define m ∗ := F − X (1 − u ). Plugging this choiceinto the definition of the average value at risk yieldsAVaR u ( X ) ≤ Z R − u ( x − m ∗ )1 x ≥ m ∗ + m ∗ F X ( dx )= 11 − u E [ X X ≥ m ∗ ] ≤ − u k X k L p P [ X ≥ m ∗ ] ( p − /p where we made use of H¨older’s inequality in the last step. As P [ X ≥ m ∗ ] = 1 − u this yields AVaR u ( X ) ≤ k X k L p / (1 − u ) /p .Sublinearity now implies | AVaR u ( X ) | ≤ AVAR u ( | X | ) which shows the claim incase that F is continuous and strictly increasing.In general, approximate F by strictly increasing functions (for instance, addindependent Gaussian random variables to X with vanishing variance). (cid:3) Lemma 3.4.
Let < q < p ≤ ∞ and assume that ρ ( | X | ) < ∞ for all X withfinite weak q -th moment. Then, for every fixed a > , there exists a constant b > such that ρ ( X ) = sup β ∈M : s.t. β ( γ ) ≤ b (cid:16) Z [0 , AVaR u ( X ) γ ( du ) − β ( γ ) (cid:17) for all X ∈ L p with k X k L p ≤ a .Proof. Let X ∗ be the random variable of Lemma 3.2.(a) In a first step we show that | ρ ( X ) | ≤ C for all X ∈ L p with k X k L p ≤ a . Forsuch X , by Lemma 3.2 and Lemma 3.3, one has thatAVaR u ( | X | ) ≤ a (1 − u ) /p ≤ AVaR u ( CX ∗ )(3.2) for every u ∈ [0 , ρ ( | X | ) ≤ sup β ∈M (cid:16) Z [0 , AVaR u ( | X | ) γ ( du ) − β ( γ ) (cid:17) ≤ sup β ∈M (cid:16) Z [0 , AVaR u ( CX ∗ ) γ ( du ) − β ( γ ) (cid:17) = ρ ( CX ∗ )for every X with k X k L p ≤ a .It further follows by convexity and monotonicity of ρ together with ρ (0) = 0,that | ρ ( X ) | ≤ ρ ( | X | ) for all X ∈ L p . This implies that indeed | ρ ( X ) | ≤ C forall X ∈ L p with k X k L p ≤ a .(b) We proceed to prove the claim. Define ϕ : R + → [0 , ∞ ] by ϕ ( y ) := sup x ∈ R + ( xy − ρ ( xX ∗ )) . Then ϕ is convex, increasing, and as ρ ( xX ∗ ) < ∞ for all x ∈ R + , one can verifythat ϕ ( y ) /y → ∞ as y → ∞ . Now note that the (spectral) representation of ρ in (3.1) implies that ρ ( xX ∗ ) ≥ Z [0 , AVaR u ( xX ∗ ) γ ( du ) − β ( γ )for all x ≥ γ ∈ M . Therefore, one has β ( γ ) ≥ sup x ≥ (cid:16) Z [0 , AVaR u ( xX ∗ ) γ ( du ) − ρ ( xX ∗ ) (cid:17) = ϕ (cid:16) Z [0 , AVaR u ( X ∗ ) γ ( du ) (cid:17) for every γ ∈ M . For every X with k X k L p ≤ a , by (3.2) one has Z [0 , AVaR u ( X ) γ ( du ) − β ( γ ) ≤ C Z [0 , AVaR u ( X ∗ ) γ ( du ) − β ( γ ) ≤ Cϕ − ( β ( γ )) − β ( γ ) , (3.3)where ϕ − denotes the (right)-inverse of ϕ .As ϕ ( y ) /y → ∞ when y → ∞ , one has that ϕ − ( x ) /x → x → ∞ which implies that Cϕ − ( β ( γ )) − β ( γ ) → −∞ when β ( γ ) → ∞ . (3.4) Now recall that ρ ( X ) equals the supremum over γ ∈ M of the left handside of (3.3) and that | ρ ( X ) | ≤ C for all X with k X k L q ≤ a by the first partof this proof. Therefore (3.4) implies that there is some constant b such thatonly γ ∈ M for which β ( γ ) ≤ b need to be considered in the computation of ρ ( X ). (cid:3) Lemma 3.5.
Let q ∈ (1 , ∞ ) and assume that ρ ( | X | ) < ∞ for all X with finiteweak q -th moment. Then, for every fixed b ∈ R + , we have Γ b ([ r, γ ∈M s.t. β ( γ ) ≤ b γ ([ r, ≤ C (1 − r ) /q for every r ∈ [0 , . ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 17
Proof.
Let X ∗ be the random variable of Lemma 3.2. Then it follows from inter-changing two suprema in the spectral representation (3.1) (one over n and one over γ ), monotone convergence (applied under each γ ), and Lemma 3.2 thatsup n ρ ( X ∗ ∧ n ) = sup γ ∈M sup n (cid:16) Z [0 , AVaR µu ( X ∗ ∧ n ) γ ( du ) − β ( γ ) (cid:17) ≥ sup γ ∈M s.t. β ( γ ) ≤ b (cid:16) Z [0 , qq − − u ) /q γ ( du ) − β ( γ ) (cid:17) . (3.5)By assumption sup n ρ ( X ∗ ∧ n ) ∈ R , which implies thatsup γ ∈M s.t. β ( γ ) ≤ b Z [0 , − u ) /q γ ( du ) ≤ q − q (cid:16) ρ ( X ∗ ) + b + 1 (cid:17) = C. An application of Chebyshev’s inequality now implies thatΓ b ([ r, ≤ sup γ ∈M s.t. β ( γ ) ≤ b (1 − r ) /q Z [ r, − u ) /q γ ( du ) ≤ C (1 − r ) /q , which proves the claim. (cid:3) Lemma 3.6.
Let ≤ b < a < . Then it holds that X n ≥ − an · (cid:16) ( x n ) ∧ bn (cid:17) ≤ C (cid:16) x a − b − b ∨ x (cid:17) for every x ∈ [0 , ∞ ) .Proof. For x = 0 there is nothing to prove. We now consider the case x ∈ (0 , s n the summand, and set n N := log(1 /x )(1 − b ) log 2 . Then a quick computation reveals s n = ( x n (1 − a ) if n < n N , n ( b − a ) if n ≥ n N . By properties of the geometric series one has X n
0, the definition of n N implies that2 n N (1 − a ) = (cid:16) − a (1 − b ) log 2 (cid:17) log(1 /x ) = (cid:16) x (cid:17) log (cid:0) − a (1 − b ) log 2 (cid:1) = x a − − b . (3.6)Putting everything together, this implies X n 1) and hence x ≤ x ( a − b ) / (1 − b ) for x ∈ [0 , 1] yields the claim for x ∈ (0 , x ≥ x ≥ x ( a − b ) / (1 − b ) and X n ≥ − an · (cid:16) ( x n ) ∧ bn (cid:17) ≤ x X n ≥ − an · (cid:16) n ∧ bn (cid:17) ≤ Cx, where the last inequality follows from convergence of the geometric series / theprevious step. (cid:3) For every N ≥ u ∈ [0 , 1) define δ Nu := sup g ∈G (cid:12)(cid:12)(cid:12) AVaR µu ( F + g · G ) − AVaR µ N u ( F + g · G ) (cid:12)(cid:12)(cid:12) . (3.7)The following lemma controls uniformly the behavior of δ . Lemma 3.7. Let < q < p ≤ ∞ . Then it holds E h sup u ∈ [0 ,v ] δ Nu i ≤ C (1 − v ) √ N ∧ C (1 − v ) / p for every v ∈ (0 , .Proof. We start with the easier estimate, namely that E h sup u ∈ [0 ,v ] δ Nu i ≤ C (1 − v ) / p . (3.8)As | F + g · G | ≤ CJ for every g ∈ G , monotonicity of AVaR u implies AVaR µu ( F + g · G ) ≤ AVaR µu ( CJ ) for every g ∈ G ; similarly with µ replaced by µ N . Now Lemma3.3 implies sup u ∈ [0 ,v ] δ Nu ≤ k CJ k L p ( µ ) + k CJ k L p ( µ N ) (1 − v ) / p . Further Jensen’s inequality implies E [ k CJ k L p ( µ N ) ] ≤ k CJ k L p ( µ ) and thus we get(3.8).To conclude the proof, we are left to prove that E h sup u ∈ [0 ,v ] δ Nu i ≤ C (1 − v ) √ N , (3.9)which we shall do in several steps.(a) Define H := { ϕ ( F + g · G ) : ϕ : R → R is 1-Lipschitz, ϕ (0) = 0 and g ∈ G} . Then it holds thatsup u ∈ [0 ,v ] δ Nu ≤ − v sup H ∈H (cid:12)(cid:12)(cid:12) Z X H ( µ − µ N )( dx ) (cid:12)(cid:12)(cid:12) . (3.10) ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 19 Indeed, every function appearing in the definition of AVaR u is of the form ϕ ( F + g · G ) / (1 − u ) for a 1-Lipschitz function, see (2.1). Subtracting ϕ ( F (0) + g · G (0)) / (1 − u ) does not change the value of the difference of two integrals,which yields the claim.(b) We proceed to compute the covering numbers of H . Fix a ≥ H = ϕ ( F + g · G ) and ˜ H = ˜ ϕ ( F + ˜ g · G ) in H we write k H − ˜ H k L ( µ N ) ≤ k J ≤ a ( H − ˜ H ) k L ( µ N ) + k J>a ( H − ˜ H ) k L ( µ N ) and estimate both terms separately. As G is bounded, | F + g · G | ≤ CJ forevery g ∈ G ; thus we have for the first term k J ≤ a ( H − ˜ H ) k L ( µ N ) ≤ sup | t |≤ Ca | ϕ ( t ) − ˜ ϕ ( t ) | + Ca | g − ˜ g | . On the other hand, the second term equals zero if p = ∞ (that is, if J isbounded) and a is large enough. For p < ∞ we have k J>a ( H − ˜ H ) k L ( µ N ) ≤ C k J>a J k L ( µ N ) ≤ C k J p k L ( µ N ) a p . In the following, we shall consider only the (slightly more difficult) case p < ∞ and leave the minor changes for the case p = ∞ to the reader. In conclusionwe have shown k H − ˜ H k L ( µ N ) ≤ sup | t |≤ Ca | ϕ ( t ) − ˜ ϕ ( t ) | + Ca | g − ˜ g | + C k J p k L ( µ N ) a p . (3.11) With this preparatory work out of the way, we proceed to compute N ( H , k ·k L ( µ N ) , ε ) by making all three terms in the right hand side of (3.11) smallerthan ε/ 3. Let ˜ L be a set of (Lipschitz) functions from R → R such that for every1-Lipschitz function ϕ there is ˜ ϕ ∈ ˜ L such that sup | t |≤ ca | ϕ ( t ) − ˜ ϕ ( t ) | ≤ ε/ L can be constructed withcard( ˜ L ) ≤ exp (cid:16) C ( ε/a ) ∧ (cid:17) , see [43, Theorem 2.7.1] . Moreover, let ˜ G such that for every g ∈ G there is˜ g ∈ ˜ G satisfying | g − ˜ g | ≤ ε/ Ca . Such set can be constructed withcard( ˜ G ) ≤ (cid:16) C ( ε/a ) ∧ (cid:17) e , by an equidistant grid of the bounded set G ⊂ R e . Finally, if a := (cid:16) C k J p k L ( µ N ) ε (cid:17) /p , Indeed, while [43] considers Lipschitz functions from [0 , → [0 , ϕ ( t ϕ (2 ca ( t − / / ca ) forms a bijection from 1-Lipschitz function with domain [0 , 1] to the oneswith domain [ − ca, ca ]. The latter can be extended to function with domain R and this is exactlyhow our set ˜ L is obtained. then the last term in (3.11) is smaller than ε/ k J k L p ( µ N ) = k J p k /pL ( µ N ) , we conclude N ( H , k · k L ( µ N ) , ε ) ≤ card( ˜ L )card( ˜ G ) ≤ exp (cid:16) C ( ε ( p +1) /p /C k J k L p ( µ N ) ) ∧ (cid:17) · (cid:16) C ( ε ( p +1) /p /C k J k L p ( µ N ) ) ∧ (cid:17) e (3.12)for every ε > E h sup u ∈ [0 ,v ] δ Nu i ≤ C √ N E h Z ∞ q log N ( H , k · k L ( µ N ) , ε ) dε i = C √ N E h k J k L p ( µ N ) Z ∞ r log (cid:16) exp (cid:16) C ˜ ε ( p +1) /p ∧ (cid:17)(cid:16) C ˜ ε ( p +1) /p ∧ (cid:17) e (cid:17) d ˜ ε i , where the last line followed from using (3.12) and substituting ε by ˜ ε = ε/ k CJ p k /pL ( µ N ) . It remains to notice that the (now deterministic) integral over d ˜ ε is finite. Moreover, Jensen’s inequality implies E [ k J k L p ( µ N ) ] ≤ k J k L p ( µ ) and the latter term is finite by assumption.In conclusion, we have shown (3.9) and the proof is complete. (cid:3) Proof of Theorem 3.1. Recall the definition of M := k J k L p ( µ ) and M N := k J k L p ( µ N ) given in (2.2). As in the proof of Theorem 2.1 we set∆ N := sup g ∈G (cid:12)(cid:12)(cid:12) ρ µ ( F + g · G ) − ρ µ N ( F + g · G ) (cid:12)(cid:12)(cid:12) and consider both terms in E [∆ N ] = E [∆ N M N ≤ M +1 ] + E [∆ N M N >M +1 ]separately.(a) As G is bounded, we have k F + g · G k L p ( µ ) ≤ CM . Therefore, by Lemma 3.5,there exists some b such that ρ µ ( F + g · G ) = sup β ∈M s.t. β ( γ ) ≤ b (cid:16) Z [0 , AVaR µu ( F + g · G ) γ ( du ) − β ( γ ) (cid:17) for all g ∈ G . Possibly making b larger, the same reasoning implies that, on theset M N ≤ M + 1, the same representation holds true if µ is replaced by µ N .Recalling the definition of δ N in (3.7) and the definition of Γ b given in Lemma3.5, we can write∆ N M N ≤ M +1 ≤ sup β ∈M s.t. β ( γ ) ≤ b Z [0 , δ Nu γ ( du ) ≤ X n ≥ Γ b ( I n ) sup u ∈ I n δ Nu , where I n := [1 − − n +1 , − − n ) for every n , that is, I = [0 , / I = [1 / , / b ( I n ) ≤ C − n/q by means of Lemma 3.5 and E [sup u ∈ I n δ Nu ] ≤ C (2 n √ N − ∧ n/ p by means of and Lemma 3.7. Then, an application of ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 21 Lemma 3.6 implies that E [∆ N M N ≤ M +1 ] ≤ C X n ≥ − n/q (cid:16) n √ N ∧ n/ p (cid:17) ≤ C √ N /q − / p − / p ∨ C √ N ≤ C √ N /q − / p − / p where the last inequality as (1 /q − / p ) / (1 − / p ) ∈ (0 , E [∆ N M N >M +1 ] ≤ E [∆ N ] / P [ M N > M + 1] / ≤ CE [∆ N ] / √ N . It therefore remains to check that E [∆ N ] ≤ C . In fact, if p = ∞ then M N ≤ M almost surely and there is nothing left to prove. So assume that p < ∞ . Usingmonotonicity of ρ and the fact that G is bounded, this boils down to checkingthat E [ ρ µ N ( CJ ) ] ≤ C . To that end, by definition of w p and as J ≥ 1, one hasthat ρ µ N ( CJ ) ≤ w p ( C k J k L p ( µ N ) ) ≤ w p (cid:16) C N X n ≤ N J ( X n ) p (cid:17) . By convexity of x w p ( x ) we may further estimate E [ ρ µ N ( CJ ) ] ≤ N X n ≤ N E h w p (cid:16) CJ ( X n ) p (cid:17) i = Z X w p ( CJ ( x ) p ) µ ( dx )and the last term is finite by assumption.Combining both steps completes the proof. (cid:3) Deviation inequalities In the following, we prove (the following generalizations of) part (ii) of Theorem1.7 and part (ii) of Theorem 1.1 stated in the introduction. Theorem 4.1. Assume that F and G are bounded functions and that the set G is bounded. Moreover, assume that ρ ( | X | ) < ∞ for all X with finite weak q -thmoment, where q ∈ (1 , ∞ ) . Then there are constants c, C > such that P h sup g ∈G | ρ µ ( F + g · G ) − ρ µ N ( F + g · G ) | ≥ ε i ≤ C exp (cid:16) − cN ε q (cid:17) for all ε > and N ≥ .Proof. (a) In a first step, as F , G , and G are bounded, the same arguments asgiven for Lemma 2.2 (simpler, in fact, due to the boundedness of the function J therein) show thatAVaR µu ( F + g · G ) = inf | m |≤ a − u Z X ( F + g · G − m ) + + (1 − u ) m µ ( dx )(4.1) for every u ∈ [0 , 1) and g ∈ G , where a depends on k F k ∞ , k G k ∞ , and thesize of G . Moreover, (4.1) remains true if µ is replaced by µ N . Further, as R X (1 − u ) m ( µ − µ N )( dx ) = 0 for all m ∈ R and u ∈ [0 , (cid:12)(cid:12) AVaR µu ( F + g · G ) − AVaR µ N u ( F + g · G ) (cid:12)(cid:12) ≤ δ N − u , (4.2)where we set δ N := (cid:12)(cid:12)(cid:12) sup H ∈H Z X H ( x ) ( µ − µ N )( dx ) (cid:12)(cid:12)(cid:12) and H := { ( F + g · G − m ) + : | m | ≤ a and g ∈ G} . (b) In a second step, notice that the same arguments (again, actually simpler as J is bounded) as in the proof of Theorem 1.7 imply that there is some b > γ ∈ M in the spectral representation (3.1) of ρ can berestricted to those γ for which β ( γ ) ≤ b . This implies (cid:12)(cid:12) ρ µ ( F + g · F ) − ρ µ N ( F + g · G ) (cid:12)(cid:12) ≤ sup γ ∈M s.t. β ( γ ) ≤ b Z (0 , | AVaR µu ( F + g · G ) − AVaR µ N u ( F + g · G ) | γ ( du ) ≤ X n ≥ Γ b ( I n ) sup u ∈ I n δ N − u where I n := [1 − − n +1 , − − n ) for every n . Estimating Γ b ( I n ) C − n/q byLemma 3.5 and using Lemma 3.6, one arrives atsup g ∈G (cid:12)(cid:12) ρ µ ( F + g · F ) − ρ µ N ( F + g · G ) (cid:12)(cid:12) ≤ C (cid:16) ( δ N ) /q ∨ δ N (cid:17) ≤ C ( δ N ) /q (4.3)for all N ≥ δ N ≤ C almostsurely (and hence δ N ≤ C ( δ N ) /q almost surely).(c) In a last step, it remains to estimate δ N . The goal is to apply [43, Theorem2.14.10]. By Corollary 2.4 one has that N ( H , k · k ∞ , ε ) ≤ (cid:16) Cε (cid:17) e +1 ∨ ε > 0. As N ( H , k · k L ( ν ) , ε ) ≤ N ( H , k · k ∞ , ε ) for every probability ν on X , the requirement (2.14.8) in [43] for [43, Theorem 2.14.10] is satisfied.Therefore an application of that theorem shows that P h √ Nδ N ≥ ε i ≤ C exp (cid:16) ε a C (cid:17) exp( − ε ) , for some a ∈ (0 , 2) (with the notation of that theorem: as U < 2, chose δ > U + δ < (cid:16) ε a C (cid:17) exp( − ε ) ≤ C exp (cid:16) − ε C (cid:17) , which implies P [ δ N ≥ ε ] ≤ C exp (cid:16) − N ε C (cid:17) In fact, the cited theorem is stated for classes of functions taking values in [0 , ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 23 for all ε > N ≥ 1. The proof is completed by plugging the last estimatein equation (4.3). (cid:3) Theorem 4.2. Assume that F and G are bounded functions, that the set G isbounded, and let ρ = OCE be the optimized certainty equivalent risk measure. Thenthere are constants c, C > such that P h sup g ∈G | ρ µ ( F + g · G ) − ρ µ N ( F + g · G ) | ≥ ε i ≤ C exp (cid:16) − cN ε (cid:17) for all ε > and N ≥ .Proof. The proof is similar to the one given for Theorem 4.1 and we shall keep itshort. By Lemma 2.2 one has | ρ µ ( F + g · G ) − ρ µ N ( F + g · G ) | ≤ sup H ∈H (cid:12)(cid:12)(cid:12) Z X H ( x ) ( µ − µ N )( dx ) (cid:12)(cid:12)(cid:12) =: δ N almost surely, for the set H := { l ( F + g · G − m ) + m : g ∈ G and | m | ≤ a } . Again apply Corollary 2.4 and [43, Theorem 2.14.10] to deduce that P [ δ N ≥ ε ] ≤ C exp( − cN ε ) for some constants c, C > 0. This concludes the proof. (cid:3) Sharpness of rates Whenever investigating average errors involving a (linear) dependence on i.i.d.phenomena, the central limit theorem assures that the 1 / √ N rate cannot be im-proved. Indeed, take for instance ρ ( X ) := E [ X ] = AVaR ( X ). Then, if µ is aprobability on [0 , 1] and F is a (bounded) function which is equal to the identityon [0 , ρ µ N ( F ) = 1 N X n ≤ N F ( S n ) ≈ Normal (cid:16) ρ µ ( F ( S )) , Var( F ( S )) N (cid:17) for large N by the central limit theorem. In particular E [ | ρ µ ( F ) − ρ µ N ( F ) | ] ≈ p Var( F ( S )) /N for all large N and P [ | ρ µ ( F ) − ρ µ N ( F ) | ≥ ε ] ≈ − εN/ Var( F ( S )))where Φ is the cumulative distribution function of the normal distribution.In contrast to the above 1 / √ N rate, the rates obtained for general risk measurese.g. in Theorem 1.7 are worse. As the proofs are presented, they depend on thecontinuity (integrability) of the risk measure. This section is devoted to showingthat the integrability conditions imposed are necessary to obtain any rates and thatthe rates are in fact sharp up to a factor of 2 (comments on the factor 2 are givenin Remark 5.3 below).To ease the notation, for probabilities µ on R with bounded support, we shallwrite ρ ( µ ) := ρ ( X ) where X ∼ µ (this corresponds, of course, to ρ µ ( F ) for X = R and F the identity). With thisnotation, Proposition 1.6 reads as follows. Proposition 5.1. Let ε > . Then there exists a sublinear law invariant risk mea-sure ρ : L ∞ → R satisfying the Lebesgue property (see below) as well as a constant c > such that sup µ probability on { , } E h(cid:12)(cid:12) ρ ( µ ) − ρ ( µ N ) (cid:12)(cid:12)i ≥ cN ε for all (large) N . Remark 5.2. Without the assumption that ρ satisfies the Lebesgue property, theproof of Proposition 5.1 becomes rather trivial: take ρ ( X ) := ess . sup X and let µ be some probability with support [0 , ρ ( µ N ) = max n ≤ N X n (where ( X n ) isan i.i.d. sample of µ ) one has P [ | ρ ( µ ) − ρ ( µ N ) | ≥ ε ] = P h max n ≤ N X n ≤ − ε i = µ ([0 , − ε ]) N . For suitable choices of µ , the latter term can converge arbitrary slow to zero. There-fore E [ | ρ ( µ ) − ρ ( µ N ) | ] = R µ ([0 , − ε ]) N dε converges arbitrary slow as well.The proof of Proposition 5.1 below mimics the idea of Remark 5.2 while simul-taneously enforcing the Lebesgue property. Moreover, it actually also reveals thefollowing. Remark 5.3. Combining Theorem 1.7 and Proposition 1.9 gives the following: Forevery law invariant risk measure as in the theorem, there are two constants c ≥ C > cN /q ≤ sup µ probability on { x ,x } E h(cid:12)(cid:12) π µ ( F ) − π µ N ( F ) i ≤ C √ N /q (5.1)for all N ≥ 1. Further, for certain choices of ρ , the constant c can be chosen strictlypositive c > / √ N can never be beaten, hence the left hand side of (5.1) certainly can beimproved (at least for q ∈ (1 , q in oder to obtain sharper lower bounds. At thisstage, unfortunately, the authors are unaware how.To ease notation, denote byBer( p ) := (1 − p ) δ + pδ (5.2)the Bernoulli distribution with parameter of success p ∈ [0 , µ = Ber( p ),the empirical measure µ N of µ satisfies µ N ≡ Ber( p ) N = Ber( b p N ) where b p N := 1 N X n ≤ N X n (5.3)(almost surely) where ( X n ) are i.i.d. Ber( p ) distributed. This simple formula isactually the reason why we stick to the Bernoulli distribution, as computationsbecome a lot easier.We start with two simple lemmas. As they are important, we include their(short) proofs. ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 25 Lemma 5.4. Let p ∈ (0 , . Then AVaR u (Ber( p )) = p − u ∧ for every u ∈ [0 , .Proof. By definition, we haveAVaR u (Ber( p )) = inf m ∈ R (cid:16) − u (cid:16) p (1 − m ) + + (1 − p )( − m ) + (cid:17) + m (cid:17) = 11 − u inf m ∈ R − u if m ≥ m (1 − u − p ) + p if 0 < m < p − um if m ≤ p − u ∧ (cid:3) Lemma 5.5. It holds that sup x ≥ (cid:16) (1 − x − ε ) a + x − ε (cid:0) ( ax ) ∧ (cid:1)(cid:17) = (1 − a ε ) a + a ε for every a ∈ [0 , and ε > .Proof. For a = 0 or a = 1 the claim is clear. If a ∈ (0 , x ∈ [1 , /a ]. For those x the value equals a (1 + x − ε − x − ε ) which isincreasing as a function in x . Hence the optimal x is 1 /a which yields the claim. (cid:3) Proof of Proposition 5.1. Let ε > ρ : L ∞ → R by ρ ( X ) := sup x ≥ (cid:16) (1 − x − ε )AVaR ( X ) + x − ε AVaR − /x ( X ) (cid:17) (5.4)As AVaR is a law invariant sublinear risk measure, ρ inherits all those properties.Moreover, a quick computation shows that AVaR u ( X ) satisfies the Lebesgue prop-erty for every u ∈ [0 , x − ε → x → ∞ , this then implies that ρ satisfiesthe Lebesgue property as well.For every N , we shall choose µ := Ber( p N ) with p N := 1 /N in the supremumover all probabilities on [0 , 1] appearing in the statement of the proposition. So let( X Nn ) be an i.i.d. sample of Ber( p N ), that is, P [ X Nn = 1] = p N = 1 /N for all n and N , and recall that µ N = Ber( b p N ) where b p N := N P n ≤ N X Nn . We will show that ρ (Ber( p N )) − E [ ρ (Ber( b p N ))] ≥ p εN C for all (large) N . Using the triangle inequality, this clearly implies the statementof the proposition.By Lemma 5.4 and Lemma 5.5 we compute ρ (Ber( p N )) = sup x ≥ (cid:16) (1 − x − ε ) p N + x − ε (cid:0) ( xp N ) ∧ (cid:1)(cid:17) = (1 − p εN ) p N + p εN and similarly ρ (Ber( b p N )) = (1 − b p εN ) b p N + b p εN . Now recall that E [ b p N ] = p N and, by Jensen’s inequality, E [ b p εN b p N ] ≥ p εN p N ; hence ρ (Ber( p N )) − E [ ρ (Ber( b p N ))] ≥ p εN − E [ b p εN ] . For the set A N := { b p N = 0 } = { X Nn = 0 for all n ≤ N } , one computes P [ A N ] = (1 − p N ) N → exp( − ∈ (0 , N → ∞ . Moreover E [ b p εN ] = E [ b p εN A cN ] and an application of H¨older’s inequality(with exponents p = 1 /ε and q = 1 / ( p − 1) = 1 / (1 − ε )) gives E [ b p εN ] ≤ E [ b p N ] ε P [ A cN ] − ε ≤ p εN (cid:16) − exp( − (cid:17) − ε =: p εN c for all N large enough. Here we also used that E [ b p N ] = p N and the previouscomputation for (the limit of) P [ A N ].In particular ρ (Ber( p N )) − E [ ρ (Ber( b p N ))] ≥ p εN (1 − c )for all N large enough. As c ∈ (0 , (cid:3) Proof of Proposition 1.9. We use the notation as in the proof of Proposition 5.1.Define ρ : L ∞ → R as in (5.4) with ε := 1 /q . We need to check that ρ ( | X | ) < ∞ for all X with finite q -th moment. By definition of ρ and Lemma 3.3 it follows that ρ ( | X | ∧ n ) ≤ k X k L q sup x ≥ (cid:16) (1 − x − ε )1 + x − ε x /p (cid:17) < ∞ for all n ∈ N . This shows that ρ ( | X | ) < ∞ .At this point we may copy the rest of the proof of Proposition 5.1 and obtainthat ρ (Ber( p N )) − E [ ρ (Ber( b p N ))] ≥ N − /q /C for all large N ≥ 1, which impliesthe claim. (cid:3) Additional results In this last section we provide an additional result pertaining to the boundednessassumption on G , and the remaining proofs, notably of the estimation of shortfallrisk measure and of utility maximization.6.1. The set G needs to be bounded. Our set up also includes the case of riskbased hedging, in which case one would rather write π µ ( F ) = inf n m ∈ R : there is some g ∈ G such that ρ µ ( F − m + g · G ) ≤ o . (This expression follows from additivity on the constants of ρ µ ).In prose, π µ ( F ) is the minimal capital m needed such that, possibly after trading,the the loss F reduced by m becomes acceptable. In this setting one would typicallynot restrict to bounded strategies, that is, one would take G = R e .The goal of this section is to prove the next proposition, which states that re-quiring G to be bounded is not just a technical simplification we made, but in factnecessary.One precaution needs to be made though: Assume for instance that G i = 0 forall i , then clearly g ρ µ ( F + g · G ) does not depend on g and the size of G does ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 27 not matter. To exude such cases (without too much effort), we assume that ( µ, G )non-degenerate in the sense that for every g ∈ R e \ { } one has µ ( g · G < > Proposition 6.1. Let ρ : L ∞ → R be any law invariant risk measure, let F andeach G i be bounded, and let ( µ, G ) be non-degenerate in the above sense. Assumethat π µ ( F ) ∈ R and E [ | π µ ( F ) − π µ N ( F ) | ] → as N → ∞ . Then the set G needs to be bounded.Proof. We show the negation, namely that if G is unbounded, convergence cannotbe true. To that end, let ( g n ) be a sequence in G witnessing that G is unbounded.After passing to a subsequence, there exists g ∗ ∈ R e with | g ∗ | = 1 such that g n / | g n | → g ∗ . By assumption, µ ( g ∗ · G < > 0, hence there is ε > µ ( U ) > U := { x ∈ X : g ∗ · G ( x ) < − ε } . By definition of π one has π µ N ( F ) ≤ ρ µ N ( F + g n · G )for every n ∈ N . Moreover, it holds that F + g n · G ≤ sup U F + sup U g n · G =: a n µ N -a.s. on { µ N ( U ) = 1 } for every n ∈ N . By assumption the first term in the definition of a n is bounded.Further, as g n / | g n | converges to g ∗ , one has that g n · G = | g n | (cid:16) g ∗ · G + (cid:16) g n | g n | − g ∗ (cid:17) · G (cid:17) ≤ | g n | (cid:16) − ε + C (cid:12)(cid:12)(cid:12) g n | g n | − g ∗ (cid:12)(cid:12)(cid:12)(cid:17) < − | g n | ε U for all large n . By monotonicity of ρ µ N , this implies ρ µ N ( F + g n · G ) ≤ ρ µ N ( a n ) = a n → −∞ on { µ N ( U ) = 1 } as n → ∞ . Finally, as P [ µ N ( U ) = 1] = 1 − (1 − µ ( U )) N > N ≥ 1, we conclude that π µ N ( F ) = −∞ with positive probability. Inparticular E [ | π µ ( F ) − π µ N ( F ) | ] = ∞ for every N ≥ 1, which proves the claim. (cid:3) The proof of Proposition 1.10. We only sketch the proof of Proposition1.10, as it is very similar to that of Theorem 1.7 on the optimized certainty equiv-alents. The only difference is the absence of the component m (in the definition ofOCE), which actually makes life even simpler. In particular, we have N (cid:16)(cid:8) U ( F + g · G ) : g ∈ G (cid:9) , k · k ∞ , ε (cid:17) ≤ (cid:16) Cε (cid:17) e ∨ ε > Remaining proofs for Theorem 1.7. We finally provide the proof of The-orem 1.7 for the case that ρ is the shortfall risk measure.(a) Define the function J : R → R by J ( m ) := inf g ∈G Z l ( F + g · G − m ) µ ( dx )and in the same way define the (random) function J N with µ replaced by µ N .Further let a ≥ | F + g · G | ≤ a for every g ∈ R . Then | π µ ( F ) | ≤ a ,or, in other words π µ ( F ) = inf { m ∈ [ a, a ] : J ( m ) ≤ } . The same is true if µ is replaced by µ N and J by J N (almost surely).(b) We claim that there is c > J ( ˜ m ) ≤ J ( m ) − c ( ˜ m − m ) for all m, ˜ m ∈ [ − a, a ] with m ≤ ˜ m . Indeed, as l is convex and strictly increasing, its(right) derivative l ′ is strictly positive. Now let g ∈ G be optimal for J ( m )(for notational simplicity, otherwise use some ε -optimal g ), that is, J ( m ) = R l ( F + g · G − m ) dµ . The fundamental theorem of calculus then implies J ( ˜ m ) ≤ Z l ( F + g · G − ˜ m ) dµ = Z l ( F + g · G − m ) − ( ˜ m − m ) Z l ′ ( F + g · G − m + t ( ˜ m − m )) dt dµ. The term inside the the second integral is larger than c := inf | t |≤ a l ′ ( t ) > J ( ˜ m ) ≤ J ( m ) − c ( ˜ m − m ), which is what we claimed.(c) We claim that J and J N are continuous. Indeed, this is an easy consequenceof the continuity of ( m, g ) R l ( F + g · G − m ) dµ together with the fact that G it relativity compact (similarly for J N ); we spare the details.(d) Step (b) in particular implies that J is strictly increasing. Combining thiswith the continuity of J yields that π µ ( F ) is the unique number satisfying J ( π µ ( F )) = 1. Similarly, π µ N ( F ) is the unique number satisfying J ( π µ N ( F )) =1 and therefore | J ( π µ N ( F )) − J ( π µ ( F )) | = | J ( π µ N ( F )) − J N ( π µ N ( F )) |≤ sup | m |≤ a | J ( m ) − J N ( m ) | =: ∆ N . Making use of step (a), this implies | π µ N ( F ) − π µ ( F ) | ≤ c ∆ N and so it remainsto gain control over ∆ N . As∆ N ≤ sup H ∈H (cid:12)(cid:12)(cid:12) Z H d ( µ − µ N ) (cid:12)(cid:12)(cid:12) for H := { l ( F + g · G − m ) : | m | ≤ a and g ∈ G} , we can use Corollary 2.4 and Dudley’s theorem as in the proof of Theorem1.1 to obtain E [∆ N ] ≤ C/ √ N for all N ≥ 1. Similarly, Corollary 2.4 andthe arguments given for the proof of Theorem 4.1 imply that P [∆ N ≥ ε ] ≤ C exp( − cN ε ) for all ε > N ≥ 1, where c > ON-ASYMPTOTIC RATES FOR THE ESTIMATION OF RISK MEASURES 29 Acknowledgments: The authors would like to thank Patrick Cheridito for help-ful discussions.The first authors is grateful for financial support through the Austrian ScienceFund (FWF) under project P28661. References [1] B. Acciaio and G. Svindland. Are law-invariant risk functions concave on distributions? De-pendence Modeling , 1:54–64, 2013.[2] P. E. Alexander J. McNeil and R. Frey. Quantitative Risk Management . Princeton UniversityPress, 2015.[3] P. Artzner, F. Delbaen, J. M. Eber, and D. Heath. Coherent measures of risk. Math. Finance ,9:203–228, 1999.[4] D. Bartl, S. Drapeau, and L. Tangpi. Computational aspects of robust optimized certaintyequivalents and option pricing. Math. Finance , 30:287–309, 2020.[5] D. Belomestny and V. Kr¨atschmer. Central limit theorems for law-invariant coherent riskmeasures. Journal of Applied Probability , 49(1):1–21, 2012.[6] A. Ben-Tal and M. Taboulle. An old-new concept of convex risk measures: The optimizedcertainty equivalent. Math. Finance , 17:449–476, 2007.[7] A. Ben-Tal and M. Teboulle. Expected utility, penalty functions and duality in stochasticnonlinear programming. Management Science , 32:1445–1466, 1986.[8] D. Bertsimas, V. Gupta, and N. Kallus. Robust sample average approximation. Math. Pro-gramming , 171:217–282, 2017.[9] E. Beutner and H. Z¨ahle. A modified functional delta method and its application to theestimation of risk functionals. Journal of Multivariate Analysis , 101:2452–2463, 2010.[10] T. Bielecki, I. Cialenco, M. Pitera, and T. Schmidt. Fair estimation of capital risk allocations. Preprint , 2019.[11] M. Cambou and D. Filipovi´c. Model uncertainty and scenario aggregation. Math. Finance ,27(2):534–567, 2015.[12] S. X. Chen. Nonparametric estimation of expected shortfall. Journal of Financial Economet-rics , 6(1):87–107, 2008.[13] P. Cheridito and T. Li. Dual characterization of properties of risk measures on orlicz hearts. Mathematics and Financial Economics , 2(1):29, 2008.[14] P. Cheridito and T. Li. Risk measures on orlicz hearts. Mathematical Finance: An Interna-tional Journal of Mathematics, Statistics and Financial Economics , 19(2):189–214, 2009.[15] R. Cont, R. Deguest, and G. Scandolo. Robustness and sensitivity analysis of risk measure-ment procedures. Quantitative finance , 10(6):593–606, 2010.[16] F. Delbaen. Monetary Utility Functions . Osaka University Press, 2012.[17] S. Eckstein, M. Kupper, and M. Pohl. Robust risk aggregation with neural networks. Preprint ,2018.[18] P. Embrechts and M. Hofert. Statistics and quantitative risk management for banking andinsurance. Annual Review of Statistics and Its Application , 1:493–514, 2014.[19] P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using theWassertein metric: Performance guarantees and tractable reformulations. Math. Program-ming , 171:115–166, 2018.[20] Y. Feng, R. Rudd, C. Baker, Q. Mashalaba, M. Mavuso, and E. Schl¨ogl. Quantifying themodel risk inherent in the calibration and recalibration of option pricing models. Availableat SSRN 3267775 , 2018.[21] H. F¨ollmer and A. Schied. Convex measures of risk and trading constraint. Finance andStochastics , 6(4):429–447, 2002.[22] H. F¨ollmer and A. Schied. Stochastic Finance. An Introduction in Discrete Time . de GruyterStudies in Mathematics. Walter de Gruyter, Berlin, New York, 2 edition, 2004.[23] N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empiricalmeasure. Probab. Theory Relat. Fields , 162:707–738, 2015.[24] E. Furman and R. Zitikis. Weighted risk capital allocations. Insurance: Mathematics andEconomics , 43(263-269), 2008. [25] N. Gao, D. Leung, C. Munari, and F. Xanthos. Fatou property, representation, and extensionsof law-invariant risk measures on general Orlicz spaces. Finance Stoch. , 22:395–415, 2018.[26] R. Gao and A. J. Kleywegt. Distributionally robust stochastic optimization with Wassersteindistance. Preprint , 2016.[27] P. Glasserman. Monte Carlo Methods in Financial Engineering . Springer Science and Busi-ness Media, New York, 2004.[28] L. J. Hong, Z. Hu, and G. Liu. Monte carlo methods for value-at-risk and conditional value-at-risk: A review. ACM Transactions on Modeling and Computer Simulation (TOMACS) ,24(4):22, 2014.[29] X. Jin, M. C. Fu, and X. Xiong. Probabilistic error bounds for simulation quantile estimators. Management Science , 14(2):230–246, 2003.[30] E. Jouini, W. Schachermayer, and N. Touzi. Law invariant risk measures have the fatouproperty. In S. Kusuoka and A. Yamazaki, editors, Advances in Mathematical Economics ,volume 9 of Advances in Mathematical Economics , pages 49–71. Springer Japan, 2006.[31] S. Kim, R. Pasupathy, and S. G. Henderson. A guide to sample average approximation. InM. Fu, editor, Handbook of Simulation Optimization , volume 216, pages 207–243. Springer,New York, 2015.[32] A. J. Kleywegt, A. Shapiro, and T. H. de Mello. The sample average approximation methodfor stochastic discrete optimization. SIAM J. Optim. , 12(2):479–502, 2001.[33] V. Kr¨atschmer, A. Schied, and H. Z¨ahle. Qualitative and infinitesimal robustness of tail-dependent statistical functionals. Journal of Multivariate Analysis , 103:35–47, 2012.[34] V. Kr¨atschmer, A. Schied, and H. Z¨ahle. Comparative and quantitative robustness for law-invariant risk measures. Finance Stoch. , 2014.[35] S. Kusuoka. On law invariant coherent risk measures. In Advances in mathematical economics ,pages 83–95. Springer, 2001.[36] J. Ob l´oj and J. Wiesel. Statistical estimation of superhedging prices. arXiv preprintarXiv:1807.04211 , 2018.[37] S. Pal. On capital requirements and optimal strategies to achieve acceptability . PhD thesis,Citeseer, 2006.[38] S. Pal. Computing strategies for achieving acceptability: A monte carlo approach. Stochasticprocesses and their applications , 117(11):1587–1605, 2007.[39] A. Pichler. Evaluations of risk measures for different probability measures. SIAM Journal onOptimization , 23(1):530–551, 2013.[40] M. Pitera and T. Schmidt. Unbiased estimation of risk. Journal of Banking and Finance ,91:133–145, 2018.[41] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Jounal of Risk ,2:21–42, 2000.[42] A. Shapiro, D. Dentcheva and A. Ruszczyski. Lectures on stochastic programming: modelingand theory. Society for Industrial and Applied Mathematics, 2014.[43] A. W. Van Der Vaart and J. A. Wellner. Weak convergence. In Weak convergence and em-pirical processes , pages 16–28. Springer, 1996.[44] B. Verweij, S. Ahmed, A. J. Kleywegt, G. Nemhauser, and A. Shapiro. The sample aver-age approximation method applied to stochastic routing problems: A computational study. Computational Optimization and Applications , 24:289–333, 2003.[45] S. Weber. Distribution-invariant risk measures, Entropy, and large deviations. J. Appl. Prob. ,44:16–40, 2007.[46] V. R. Young. Premium principles. In J. Teugels and B. Sundt, editors, Encyclopedia ofActurial Science . Wiley, 2004.[47] T. Zwingmann and H. Holzmann. Asymptotics for the expected shortfall. Preprint , 2016. Vienna university, department of mathematics E-mail address : [email protected] Princeton University, ORFE E-mail address ::