[PDF] Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures

Abstract

When estimating the risk of a financial position with empirical data or Monte Carlo simulations via a tail-dependent law invariant risk measure such as the Conditional Value-at-Risk (CVaR), it is important to ensure the robustness of the statistical estimator particularly when the data contain noise. Kratscher et al. [1] propose a new framework to examine the qualitative robustness of estimators for tail-dependent law invariant risk measures on Orlicz spaces, which is a step further from earlier work for studying the robustness of risk measurement procedures by Cont et al. [2]. In this paper, we follow the stream of research to propose a quantitative approach for verifying the statistical robustness of tail-dependent law invariant risk measures. A distinct feature of our approach is that we use the Fortet-Mourier metric to quantify the variation of the true underlying probability measure in the analysis of the discrepancy between the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data, which enables us to derive an explicit error bound for the discrepancy when the risk functional is Lipschitz continuous with respect to a class of admissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us to examine the degree of robustness for tail-dependent risk measures. Finally, we apply our quantitative approach to some well-known risk measures to illustrate our theory.

Full PDF

aa r X i v : . [ q -f i n . R M ] J un Quantitative Statistical Robustness for Tail-Dependent LawInvariant Risk Measures

Wei Wang † Huifu Xu ‡ Tiejun Ma § June 30, 2020

Abstract

When estimating the risk of a ﬁnancial position with empirical data or Monte Carlosimulations via a tail-dependent law invariant risk measure such as the Conditional Value-at-Risk (CVaR), it is important to ensure robustness of the statistical estimator particularlywhen the data contain noise. Kr¨atscher et al. [1] propose a new framework to examinethe qualitative robustness of estimators for tail-dependent law invariant risk measures onOrlicz spaces, which is a step further from earlier work for studying the robustness of riskmeasurement procedures by Cont et al. [2]. In this paper, we follow the stream of research topropose a quantitative approach for verifying the statistical robustness of tail-dependent lawinvariant risk measures. A distinct feature of our approach is that we use the Fortet-Mouriermetric to quantify variation of the true underlying probability measure in the analysis of thediscrepancy between the laws of the plug-in estimators of law invariant risk measure basedon the true data and perturbed data, which enables us to derive an explicit error bound forthe discrepancy when the risk functional is Lipschitz continuous with respect to a class ofadmissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us toexamine the degree of robustness for tail-dependent risk measures. Finally, we apply ourquantitative approach to some well-known risk measures to illustrate our theory.

Keywords.

Quantitative robustness, tail-dependent law invariant risk measures, Fortet-Mourier metric, admissible laws, index of quantitative robustness.

One of the main purposes of quantitative modeling in ﬁnance is to quantify the loss of a ﬁnancialportfolio. Over the past two decades, various risk measures have been proposed for measuringthe risk of ﬁnancial portfolios. A risk measure is represented as a map assigning an extended realnumber (a measure of risk) to each random loss under an implicit assumption that the true loss † School of Business, University of Southampton, Southampton, SO17 1BJ, UK ([email protected]). ‡ Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong,Hong Kong ([email protected]), Shatin, N. T., Hong Kong. § School of Business, University of Southampton, Southampton, SO17 1BJ, UK ([email protected]). plug-in estimate for the riskmeasure [3].Let X denote the random loss of a ﬁnancial portfolio on a probability space (Ω , F , P ) and ρ be a law invariant risk measure. The plug-in estimate for ρ ( X ) is given by ̺ ( b P ), where b P is theempirical distribution based on available observations and ̺ is a risk functional deﬁned by ̺ ( P ) = ρ ( X ) , if X has law P ; (1.1)see e.g. [4, 5]. In the literature, Cont et al. [2] ﬁrst study the quality of statistical estimatorsof the law invariant risk measures using Hampel’s classical concept of qualitative robustness [6],that is, a risk functional estimator is said to be qualitatively robust if it is insensitive to thevariation of the sampling data. The research is important because perceived data (particularlyempirical data) may contain some noise. Without such insensitivity, ﬁnancial activities basedon the risk measures may cause damage. For instance, when ρ ( X ) is applied to allocate the riskcapital for an insurance company, altering the capital allocation may be costly. According toHampel’s theorem, Cont et al. [2] demonstrate that the qualitative robustness of a statisticalestimator is equivalent to the weak continuity of the risk functional, and that value at risk (VaR)is qualitatively robust whereas conditional value at risk (CVaR) is not.Kr¨atschmer et al [7] argue that the use of Hampel’s classical concept of qualitative robustnessmay be problematic because it requires the risk measure essentially to be insensitive with respectto the tail behaviour of the random variable and the recent ﬁnancial crisis shows that a faultyestimate of tail behaviour can lead to a drastic underestimation of the risk. Consequently, theypropose a reﬁned notion of qualitative robustness that applies also to tail-dependent statisticalfunctionals and that allows us to compare statistical functionals in regards to their degree ofrobustness. The new concept captures the trade-oﬀ between robustness and sensitivity andcan be quantiﬁed by an index of qualitative robustness. Furthermore, under the new concept,Kr¨atschmer et al [1] analyze the qualitative robustness to the law-invariant convex risk measureon Orlicz spaces and show that CVaR and spectral risk measures are all qualitatively robust whenthe perturbation of probability distribution is restricted to a ﬁner topological space. Alternativegeneralizations of Hampel’s theorem can be found for strong mixing data (Z¨ahle [8, 9]) andfor stochastic processes in various ways (Boente et al [10] and Strohriegl and Hable [11]). Forcomprehensive study of statistical robustness, we refer readers to [12, 13, 14, 15] and referencestherein.In this paper, we take a step further by deriving an error bound for the plug-in estimators oflaw invariant risk measures in terms of the variation of data and we call the analysis quantitativebecause no such error bound is established in the existing qualitative robust analysis. This2s achieved by adopting diﬀerent metrics to measure the discrepancy of the estimators andthe variation of data. Speciﬁcally, we use the Fortet-Mourier metrics as opposed to the L´evydistance in Cont et al. [2] or the weighted Kolmogorov metric in Kr¨atschmer et al. [7] toquantify the data variation (the perturbation of the true probability distributions). Moreover,we introduce a new notion of the so-called admissible laws , which eﬀectively restrict the scopeof data variation. The new metrics enable us to establish an explicit relationship between thediscrepancy of the laws of the plug-in estimators (of law invariant risk measure based on the truedata and perturbed data) and the discrepancy of the associated probability distributions of thedata. The research is inspired by the recent work of Guo and Xu [16] where the authors derivequantitative statistical robustness for preference robust optimization models under Kantorovichmetric. The main contributions of the paper can be summarized as follows.First, we introduce the notion of admissible laws induced by a probability metric, which isa class of probability distributions whose discrepancy with the law of the Dirac measure at 0 isﬁnite. The admissibility eﬀectively restricts the scope of data perturbation. Using the notion,we compare the admissibility under φ -topology and the Fortet-Mourier metric.Second, we propose to use the Fortet-Mourier metric to quantify the variation of the probabil-ity measure. The metric enables us to establish an explicit relationship between the discrepancyof the laws of the plug-in estimators of law invariant risk measure based on the true data and per-turbed data by noise and the change of the true underlying probability measures when the riskfunctional is Lipschitz continuous on a class of admissible laws. We ﬁnd that the risk functionalsassociated with the general moment-type convex risk measures are Lipschitz continuous.Third, we introduce the concept of Lipschitz continuity for a general statistical functionalon a class of admissible laws induced by the Fortet-Morier metric and ﬁnd that for the Lipschitzcontinuous risk measure, the parameter of the Fortet-Mourier metric allows us to compare thetail-dependent risk measures with regard to their degree of robustness, i.e., the index of statisticalrobustness.Fourth, we apply the new approach to examine the quantitative statistical robustness of arange of well known risk measures, including CVaR, optimized certainty equivalent, shortfallrisk measure and conclude that under mild conditions, they are all quantitatively robust, andthe indexes of quantitative robustness to them are also calculated.The rest of the paper is organized as follows. In Section 2, we set up the background ofthe problem for research. In Section 3, we introduce the concept of Fortet-Mourier metric andadmissible laws. In section 4, we establish the quantitative statistical robustness theory andcompare with the qualitative statistical robustness theory. In section 5, we apply our theory torisk measures and give some examples. Some technical details are given in the appendix. In this section, we discuss the background of statistical robustness in the context of law invariantrisk measures. We begin by a brief review of law invariant risk measures and its estimation, andthen move to explain the issues when the data may contain noise.3et (Ω , F , P ) be an atomless probability space, where Ω is a sample space with sigma algebra F and P is a probability measure. Let X : (Ω , F , P ) → IR be a ﬁnancial loss and F X ( x ) := P ( X ≤ x ) be the law or the probability distribution of X . For p ≥

1, let L p (Ω , F , P ) ( L p forshort) denote the space of random variables mapping from (Ω , F , P ) to IR with ﬁnite p -th ordermoments. We say that a map ρ : L → IR := IR ∪ { + ∞} is a convex risk measure [18] if itsatisﬁes the following properties:(i) Monotonicity : ρ ( X ) ≤ ρ ( Y ) for X, Y ∈ L with X ≤ Y P -almost surely;(ii) Translation invariance : ρ ( X + c ) = ρ ( X ) + c for X ∈ L and c ∈ IR;(iii)

Convexity : ρ ( λX + (1 − λ ) Y ) ≤ λρ ( X ) + (1 − λ ) ρ ( Y ) for X, Y ∈ L and λ ∈ [0 , ρ satisﬁes positive homogeneity , i.e., for any α ≥ ρ ( αX ) = αρ ( X ), then ρ is a coherent risk measure , see [19, 18] for the original deﬁnitions of these concepts. A risk measure ρ is said to be law invariant if ρ ( X ) = ρ ( Y ) for X and Y having the same law. We refer readersto F¨ollmer and Weber [20] for a recent overview of risk measures.As discussed in [2, 7], it is a widely-accepted procedure to estimate the risk of a ﬁnancial lossby means of a Monte Carlo method or from a set of available observations. Such a procedureis particularly sensible when ρ is law invariant. The following proposition states that the lawinvariance of a risk measure ρ is equivalent to the existence of a risk functional ̺ in (1.1). Proposition 2.1

Let P (IR) denote the set of all probability measures on IR . If ρ : L → IR is alaw invariant risk measure, then there exists a unique risk functional ̺ : P (IR) → IR associatedwith ρ such that for any X ∈ L , ρ ( X ) = ̺ ( P ◦ X − ) . (2.1)The result is well-known, see for instance Delage et al. [21] for random variables deﬁned in L ∞ . The usefulness of the representation is that it naturally captures the law invariance andallows one to deﬁne any law invariant risk measure directly over the space of probability measures P (IR) induced by random variables in L ∞ (also known as probability distributions), see Fritelliet al. [22]. Dentcheva and Ruszczy´nski [23] take it further to deﬁne a class of law invariant riskmeasures in the space of quantile functions directly. In a more recent development, Haskell etal. [24] extend the research to a broad class of multi-attribute choice functions deﬁned over thespace of survival functions. Let P := P ◦ X − be the push-forward probability measure on IRinduced by X . Since P ( X ≤ x ) coincides with P (( −∞ , x ]) ( P ( x ) for short), we also call P thedistribution or the law of X interchangeably throughout the paper. Consequently, we can write(2.1) as (1.1).In this paper, we are not concerned with the deﬁnition of risk measures over the space ofprobability distributions or the space of quantile functions, rather we concentrate on the stabilityof statistical estimators of law invariant risk measures. The risk functional ̺ ( P ) with the law P = P ◦ X − can be used in a natural way to construct an estimator for the risk ρ ( X ) of X ∈ L . We note that the canonical model space for law invariant convex risk measure is L [17]. P N of P based on the available observations of X andthen to plug this estimator into the risk functional ̺ to obtain the desired estimator of ρ ( X ),i.e., b ̺ N ( ξ , ξ , . . . , ξ N ) := ̺ ( P N ) , (2.2)where in this paper, P N can be seen as the empirical distribution of an independent and iden-tically distributed (i.i.d., for short) sequence ξ , ξ , . . . , ξ N of historical observations or MonteCarlo simulations, i.e., P N ( x ) := 1 N N X i =1 ξ i ≤ x , x ∈ IR . (2.3)Here and later on A denotes the indicator function of event A . Indeed, P N can be a fairly generalestimates, for instance, P N can be a smoothed empirical distribution based on uncensored dataor empirical distribution based on censored data, see, e.g., [3] or empirical distribution based onidentically distributed dependent data, see, e.g., [9].We can see that b ̺ N is a mapping from IR N to IR. Figure 1 illustrates the relationship betweenthe risk functionals, their estimators and the spaces associated.IR N L (Ω) P (IR)IREstimatingSampling ̺ρ b ̺ N TrueFigure 1:

The diagram for risk functionals, their estimators and associated spaces

In practice, the samples obtained from empirical data may contain noise. In that case, wemight regard the samples as generated by a perturbed random variable Y with law Q , that is, Q = P ◦ Y − . Let ˜ ξ , · · · , ˜ ξ N be i.i.d samples from Y . Then the practical empirical distributionfunction for estimating the law of X is Q N ( x ) := 1 N N X i =1 ˜ ξ i ≤ x , x ∈ IR , (2.4)and the practical estimator is ˜ ̺ N = b ̺ N ( ˜ ξ , · · · , ˜ ξ N ) := ̺ ( Q N ) with perceived empirical datawhereas b ̺ N is a statistical estimator with noise being detached. Since we are unable to obtainthe latter, we tend to use the former as a statistical estimator of ρ ( X ) and this works only ifthe two estimators are suﬃciently close. 5o quantify the closeness, we may look into the discrepancy between the laws of the twoestimators under some metric dl , i.e., dl (law { ̺ ( P N ) } , law { ̺ ( Q N ) } ) = dl (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) , (2.5)where P ⊗ N and Q ⊗ N denote the probability measures on measurable space (cid:0) IR N , B (IR) ⊗ N (cid:1) withmarginals P and Q on each (IR , B (IR)) respectively, B (IR) denotes the corresponding Borel sigmaalgebra of IR. Since neither P nor Q is known, we want the discrepancy to be uniformly small forall P and Q over a subset of admissible laws on P (IR) so long as Q is suﬃciently close to P undersome metric dl ′ . The uniformity may be interpreted as robustness. Qualitative robustness refersto the case that the relationship between dl (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ ˜ ̺ − N (cid:1) and dl ′ ( P, Q ) is implicitwhereas quantitative robustness refers to the case that the relationship is explicit, i.e., a functionof the latter can be used to bound the former, and this is what we aim to achieve in this paperbecause qualitative robustness have been well investigated, for instance, in [2, 1, 7]. ζ -metrics and admissible laws There are two essential elements in investigating both the qualitative and quantitative statisticalrobustness of a risk functional: One is the speciﬁc choice of probability metrics but not just thetopologies generated by them, see, e.g., [12, 2, 7], to quantify the change of the law P andto estimate the discrepancy between the laws of two estimators, i.e., (2.5); the other is thedetermination of the subset M of admissible laws in P (IR) (see, e.g., [7, 9]), containing allempirical distributions: M , emp ⊂ M , to restrict the perturbation of the law P . For instance,the subset M may be speciﬁed via some generalized moment conditions, which are interestingin econometric or ﬁnancial applications.To introduce these two essential elements thoroughly, some preliminary notions and resultsin probability theory and statistics such as φ -weak topology are required. We ﬁrst give a sketchof them to prepare our discussions in the follow-up sections. Let φ : IR → [0 , ∞ ) be a continuousfunction and M φ := (cid:8) P ′ ∈ P (IR) : R IR φ ( t ) P ′ ( dt ) < ∞ (cid:9) . In the particular case when φ ( · ) := |·| p and p is a positive number, write M p for M |·| p . Note that M φ deﬁnes a subset of probabilitymeasures in P (IR) which satisﬁes the generalized moment condition of φ . From the deﬁnition,we can see that M p ⊂ M p for any positive numbers p , p with p < p due to H¨olderinequality. Deﬁnition 3.1 ( φ -weak topology) Let φ : IR → [0 , ∞ ) be a gauge function , that is, φ iscontinuous and φ ≥ holds outside a compact set. Deﬁne C φ the linear space of all continuousfunctions h : IR → IR for which there exists a positive constant c such that | h ( t ) | ≤ c ( φ ( t ) + 1) , ∀ t ∈ IR . The φ -weak topology , denoted by τ φ , is the coarsest topology on M φ for which the mapping g h : M φ → IR deﬁned by g h ( P ′ ) := R IR h ( t ) P ′ ( dt ) , ∀ h ∈ C φ , is continuous. A sequence { P l } ⊂ M φ is said to converge φ -weakly to P ∈ M φ written P l φ −→ P if it converges w.r.t. τ φ . φ -weak topology is ﬁner than the weak topology, and the two topologies coincide ifand only if φ is bounded. It is well known (see [7, Lemma 3.4]) that φ -weak convergence is equiv-alent to weak convergence, denoted by P l w −→ P , together with R IR φ ( t ) P l ( dt ) → R IR φ ( t ) P ( dt ).Moreover, it follows by [7, 1] that the φ -weak topology on M φ is generated by the metric dl φ : M φ × M φ → IR deﬁned by dl φ ( P, Q ) := dl Prok ( P, Q ) + (cid:12)(cid:12)(cid:12)(cid:12)Z IR φ ( t ) P ( dt ) − Z IR φ ( t ) Q ( dt ) (cid:12)(cid:12)(cid:12)(cid:12) , (3.1)for P, Q ∈ M φ , where dl Prok : P (IR) × P (IR) → IR + is the Prokhorov metric deﬁned by dl Prok ( P, Q ) := inf { ǫ > P ( A ) ≤ Q ( A ǫ ) + ǫ, ∀ A ∈ B (IR) } , (3.2)where A ǫ := A + B ǫ (0) denotes the Minkowski sum of A and the open ball centred at 0 on IRand B (IR) is the corresponding Borel sigma algebra on IR. We note that the Prokhorov metricmetrized the weak topology on IR see, e.g., [25]. ζ -metrics Instead of exploiting the widely-used probability metrics such as the Prokhorov metric and theweighted Kolmogorov metric in the literature of qualitative robustness [2, 7], we will switch to theso-called metrics with ζ -structure to establish the quantitative statistical robustness frameworkfor a risk functional. In particular, we will use the well-known Kantorovich metric and Fortet-Mourier metrics. The new metrics enable us to establish an explicit relationship between thediscrepancy of the laws of the plug-in estimators of law invariant risk measures based on thetrue data and perturbed data with noise and the discrepancy of the associated true probabilitymeasure. We begin with a formal deﬁnition of ζ -metrics and then clarify the relationshipsbetween metrics of ζ -structure and those used in [26, 2, 7]. Deﬁnition 3.2

Let

P, Q ∈ P (IR) and F be a class of measurable functions from IR to IR . Themetric with ζ -structure is deﬁned by dl F ( P, Q ) := sup ψ ∈F (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − Z IR ψ ( ξ ) Q ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) . (3.3)From the deﬁnition, we can see that dl F ( P, Q ) is the maximum diﬀerence of the expectedvalues of the class of measurable functions F with respect to P and Q . ζ -metrics are widely usedin the stability analysis of stochastic programming, see R¨omisch [27] for an excellent overview.The speciﬁc metrics with ζ -structure that we consider in this paper are the Kantorovich metricand the Fortet-Mourier metric. The next deﬁnition gives a precise description of the two notions. Deﬁnition 3.3 (Fortet-Mourier metric)

Let F p (IR) := n ψ : IR → IR : | ψ ( ξ ) − ψ ( ˜ ξ ) | ≤ c p ( ξ, ˜ ξ ) | ξ − ˜ ξ | , ∀ ξ, ˜ ξ ∈ IR o , (3.4)7 here c p ( ξ, ˜ ξ ) := max { , | ξ | , | ˜ ξ |} p − for all ξ, ˜ ξ ∈ IR and p ≥ describes the growth of the localLipschitz constants. The p -th order Fortet-Mourier metric for

P, Q ∈ P (IR) is deﬁned by dl F M,p ( P, Q ) := sup ψ ∈F p (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − Z IR ψ ( ξ ) Q ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) . (3.5) In the case when p = 1 , it is known as the Kantorovich metric for

P, Q ∈ P (IR) dl K ( P, Q ) := sup ψ ∈F (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − Z IR ψ ( ξ ) Q ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) . (3.6)From the deﬁnition, we can see that for any positive numbers p ≥ p ′ ≥ dl F M,p ( P, Q ) ≥ dl F M,p ′ ( P, Q ) ≥ dl K ( P, Q ) , (3.7)which means that dl F M,p ( P, Q ) becomes tighter as p increases and they are all tighter than dl K ( P, Q ). Moreover, the FortetMourier metric metricizes weak convergence on sets of probabil-ity measures possessing uniformly a p -th moment [28, p. 350]. Notice that the function t → p | t | p for t ∈ IR belongs to F p (IR). On IR, the FortetMourier metric may be equivalently written as dl F M,p ( P, Q ) = Z IR max { , | x | p − }| P ( x ) − Q ( x ) | dx, for P, Q ∈ P (IR) , (3.8)see, e.g., [29, p. 93].In the next example, we illustrate the relationship between the existing probability metricsused in statistical robustness and the metrics with ζ -structure. Example 3.1

A number of well known probability metrics are used in the literature of statisticalrobustness.(i)

The Kantorovich (or Wasserstein) metric . Let F be the set of all Lipschitz continuousfunctions with modulus being bounded by . Then dl K ( P, Q ) := Z + ∞−∞ | P ( x ) − Q ( x ) | dx = dl F ( P, Q ) . (3.9) Moreover, dl Prok ( P, Q ) ≤ dl K ( P, Q ) , see [25, Theorem 2].(ii) The L´evy distance [29]. Let F be the set of functions bounded by 1. Then dl L´evy ( P, Q ) := inf { ǫ > Q ( x − ǫ ) − ǫ ≤ P ( x ) ≤ Q ( x + ǫ ) + ǫ, ∀ x ∈ IR } ≤ dl F ( P, Q ) . Moreover, dl L´evy ( P, Q ) ≤ dl Prok ( P, Q ) and dl L´evy ( P, Q ) ≤ dl ( φ ) ( P, Q ) for any φ ≥ , see, e.g.,[25].(iii) The weighted Kolmogorov metric [7]. Let φ be a u -shaped function , i.e., a continuousfunction φ : IR → [1 , + ∞ ) that is non-increasing on ( −∞ , and non-decreasing on (0 , + ∞ ) .Then the weighted Kolmogorov metric is deﬁned as dl ( φ ) ( P, Q ) := sup x ∈ IR | P ( x ) − Q ( x ) | φ ( x ) ≤ dl F ( P, Q ) , here F is the set of all functions bounded by φ . Precisely, if F is the set of all indicatorfunctions B , where B := { ( −∞ , ξ ] , ξ ∈ IR } , then dl F ( P, Q ) = dl (1) ( P, Q ) , which is known as the Kolmogorov metric . Similarly, by letting F be the set of all weighted indicator functions withweighting φ , one can obtain dl F ( P, Q ) = dl ( φ ) ( P, Q ) .(iv) The Prokhorov metric [7]. Let F be the set of all functions bounded by 1. Then by [25], dl Prok ( P, Q ) := inf { ǫ > P ( A ) ≤ Q ( A ǫ ) + ǫ, ∀ A ∈ B (IR) } ≤ dl F ( P, Q ) , where A ǫ := { x ∈ IR : inf y ∈ A | x − y | ≤ ǫ } . Moreover, dl Prok ( P, Q ) ≤ dl K ( P, Q ) .(v) The Dudleys (or Bounded) Lipschitz metric [26]. Let F BL consist of all Lipschitz con-tinuous f such that k f k ∞ + Lip( f ) ≤ , where k f k ∞ denotes the usual sup-norm and Lip( f ) isthe Lipschiz constant of the Lipschiz function f , then dl F ( P, Q ) = sup f ∈F BL (cid:12)(cid:12)(cid:12)(cid:12)Z f ( x ) P ( dx ) − Z f ( x ) Q ( dx ) (cid:12)(cid:12)(cid:12)(cid:12) := dl Lip ( P, Q ) . Moreover, dl Prok ( P, Q ) ≤ dl Lip ( P, Q ) ≤ dl Prok ( P, Q ) , see, e.g., [26, Section 3]. We now turn to discuss another important component in statistical robust analysis, that is, thesubset M of admissible laws in P (IR) which describes the scope of the perturbation of the law P by a metric. This can be motivated by ensuring the ﬁniteness of dl ( P, Q ). To this eﬀect, weformally introduce the concept of admissible laws induced by probability metrics.

Deﬁnition 3.4 (Admissible laws induced by probability metrics)

Let dl be a probabilitymetric on P (IR) . The admissible laws induced by dl are deﬁned as P dl (IR) := { P ∈ P (IR) : dl ( P, δ ) < + ∞} , (3.10) where δ denotes the Dirac measure at . Let P p (IR) denote the admissible laws induced by the Fortet-Mourier metrics with parameter p on P (IR). By Deﬁnition 3.4, we have P p (IR) := { P ∈ P (IR) : dl F M,p ( P, δ ) < + ∞} = ( P ∈ P (IR) : sup ψ ∈F p (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − Z IR ψ ( ξ ) δ ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) < + ∞ ) = ( P ∈ P (IR) : sup ψ ∈F p (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − ψ (0) (cid:12)(cid:12)(cid:12)(cid:12) < + ∞ ) . (3.11)By triangle inequality, this ensures dl F M,p ( P, Q ) < + ∞ for any P, Q ∈ P p (IR).In the following example, we compare the admissible laws induced by diﬀerent probabilitymetrics. 9 xample 3.2 (Admissible laws induced by probability metrics) We reconsider the ad-missible laws induced by probability metrics deﬁned in Example 3.1.(i)

The admissible laws induced by the Kantorovich (or Wasserstein) metric are deﬁned as P K (IR) := { P ∈ P (IR) : dl K ( P, δ ) < + ∞} = (cid:26) P ∈ P (IR) : Z −∞ P ( x ) dx + Z + ∞ (1 − P ( x )) dx < + ∞ (cid:27) (= P (IR))= (cid:26) P ∈ P (IR) : Z IR | x | dP ( x ) < + ∞ (cid:27) = M , where the second equality follows from the deﬁnition of the Kantorovich metric (see, (3.9)). Tosee how the third equality holds, we note that for any t < , we have + ∞ > Z −∞ P ( x ) dx = Z t P ( x ) dx + Z t t P ( x ) dx + Z t −∞ P ( x ) dx ≥ Z t P ( x ) dx + 12 P (2 t ) | t | Since − tP (2 t ) ≥ , then let t → −∞ , then we have lim t →−∞ tP ( t ) = 0 . Similarly, we have lim t → + ∞ t (1 − P ( t )) = 0 . By using integration-by-parts formula (more precisely [30, Theorem1.15]), we obtain the right hand side of the third equality. The last equality follows from thedeﬁnition of φ -topology in which case φ = | · | .(ii) The admissible laws induced by the L´evy distance are deﬁned as P L´evy (IR) := { P ∈ P (IR) : dl L´evy ( P, δ ) < + ∞} = P (IR) . Since dl L´evy ≤ , then the admissible laws coincide with P (IR) .(iii) The admissible laws induced by the weighted Kolmogorov metric are deﬁned as P ( φ ) (IR) := { P ∈ P (IR) : dl ( φ ) ( P, δ ) < + ∞} = (cid:26) P ∈ P (IR) : sup x ≤ | P ( x ) φ ( x ) | + sup x> | (1 − P ( x )) φ ( x ) | < + ∞ (cid:27) , which coincides with the set M ( φ )1 deﬁned in Kr¨atschmer at al. [7, subsection 3.2].If φ is bounded on IR , then it is straight that P ( φ ) (IR) = P (IR) . In the case when φ isunbounded on IR , then M φ ⊂ P ( φ ) (IR) ⊂ \ ǫ> M φ − ǫ . (3.12) In what follows, we give a proof for (3.12). Let P ∈ M φ , since φ is a u -shaped function, thenfor any M > and N ≤ , we have + ∞ > Z IR φ ( x ) dP ( x ) = Z + ∞ φ ( x ) dP ( x ) + Z −∞ φ ( x ) dP ( x ) ≥ Z M φ ( x ) dP ( x ) + φ ( M )(1 − P ( M )) + Z N φ ( x ) dP ( x ) + φ ( N ) P ( N ) ≥ φ ( M )(1 − P ( M )) + φ ( N ) P ( N ) , nd consequently R φdµ ≥ sup x ≤ | P ( x ) φ ( x ) | + sup x> | (1 − P ( x )) φ ( x ) | . Thus, M φ ⊂ P ( φ ) (IR) .On the other hand, for any ǫ > , if we let φ ǫ ( x ) := φ ( x ) − ǫ for x ∈ IR , then φ ǫ is agague function. Moreover, for any P ∈ P ( φ ) (IR) , there exists a k < + ∞ such that k =sup x ≤ | P ( x ) φ ( x ) | + sup x> | (1 − P ( x )) φ ( x ) | . To ease the exposition, we can assume that the law P ( x ) > for any x ∈ IR . Then φ ( x ) ≤ kP ( x ) for x ≤ φ ( x ) ≤ k − P ( x ) for x > . Thus Z IR φ ǫ ( x ) dP ( x ) = Z −∞ φ ǫ ( x ) dP ( x ) + Z + ∞ φ ǫ ( x ) dP ( x ) ≤ k − ǫ Z −∞ P ( x ) − ǫ dP ( x ) + k − ǫ Z + ∞ − P ( x )) − ǫ dP ( x )= k − ǫ (cid:20) ǫ P ( x ) ǫ (cid:21) −∞ − k − ǫ (cid:20) ǫ (1 − P ( x )) ǫ (cid:21) = 1 ǫ k − ǫ [ P (0) ǫ − (1 − P (0)) ǫ ] < + ∞ which implies P ∈ M φ ǫ . Summarizing the discussions above, we obtain (3.12).We note that if φ is unbounded, then the inclusions in (3.12) are strict because we can ﬁnda counterexample showing equality may fail, see Example B.1 in the appendix.(iv) The admissible laws induced by the Prokhorov metric are deﬁned as P Prok (IR) := { P ∈ P (IR) : dl Prok ( P, δ ) < + ∞} = P (IR) . Since dl Prok ≤ , then the admissible laws coincides with P (IR) .(v) The admissible laws induced by the Dudley’s (or Bounded) metric are deﬁned as P Lip (IR) := { P ∈ P (IR) : dl Lip ( P, δ ) < + ∞} = ( P ∈ P (IR) : sup f ∈F BL (cid:12)(cid:12)(cid:12)(cid:12)Z IR f ( x ) P ( dx ) − f (0) (cid:12)(cid:12)(cid:12)(cid:12) < + ∞ ) = P (IR) . Since dl Lip ≤ , then the admissible laws coincide with P (IR) . φ -weak topology Since φ -weak topology has been widely used for qualitative robust analysis in the literaturewhereas we use the topology induced by the Fortet-Mourier metrics for quantitative robustanalysis, it would therefore be helpful to look into potential connections of the two apparentlycompletely diﬀerent metrics. In the next proposition, we look into such connection from ad-missible set perspective (which deﬁnes the space of probability measures that P is perturbed inboth qualitative and quantitative robust analysis), we ﬁnd that P p (IR) coincides with M φ forsome speciﬁc choice of φ and subsequently show that the Fortet-Mourier metric is tighter than dl φ . 11 roposition 3.1 Let p ≥ be ﬁxed and φ p ( t ) := ( | t | , for | t | ≤ , | t | p , otherwise . The following assertions hold.(i) P p (IR) = M φ p (= M p ) .(ii) dl φ p ( P, Q ) ≤ p dl F M,p ( P, Q ) + p dl F M,p ( P, Q ) , ∀ P, Q ∈ P p (IR) .(iii) dl F M,p metrizes the φ p -weak topology on P p (IR) . Part (i) of the proposition says that the admissible set P p (IR) coincides with the set oflaws on IR satisfying the generalized moment condition of φ p . Part (ii) indicates that dl F M,p istighter than dl φ p . Part (iii) means that the φ p -weak topology on P p (IR) is generated by themetric dl F M,p . Proof.

Part (i). Since for any p ≥ p φ p ∈ F p (IR), then by the deﬁnition of P p (IR), we havethat P ∈ P p (IR) implies P ∈ M φ p and subsequently, P p (IR) ⊂ M φ p .On the other hand, let P ∈ M φ p , then R IR φ p ( ξ ) P ( dξ ) < ∞ . For any ψ ∈ F p (IR), we have | ψ ( ξ ) − ψ (0) | ≤ max { , | ξ | p − }| ξ | ≤ max {| ξ | , | ξ | p } , for all ξ ∈ IR , and consequently, (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − ψ (0) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z IR ( ψ ( ξ ) − ψ (0)) P ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z IR | ψ ( ξ ) − ψ (0) | P ( dξ )= Z IR max {| ξ | , | ξ | p } P ( dξ ) ≤ Z IR φ p ( ξ ) P ( dξ ) . Therefore, we have sup ψ ∈F p (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( ξ ) P ( dξ ) − ψ (0) (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z IR φ p ( ξ ) P ( dξ ) < ∞ , and consequently, M φ p ⊂ P p (IR).Part (ii). Since p φ p ∈ F p (IR), then for any P, Q ∈ P p (IR), (cid:12)(cid:12)(cid:12)(cid:12)Z IR φ p ( ξ ) P ( dξ ) − Z IR φ p ( ξ ) Q ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ p (cid:12)(cid:12)(cid:12)(cid:12)Z IR p φ p ( ξ ) P ( dξ ) − Z IR p φ p ( ξ ) Q ( dξ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ p dl F M,p ( P, Q ) . From Example 3.1(i) and (3.7), we have dl Prok ( P, Q ) ≤ p dl F M,p ( P, Q ). Finally, by the deﬁnitionof dl φ p , i.e., (3.1), we obtain the conclusion.Part (iii) follows straightforwardly from Part (ii).Proposition 3.1 indicates that despite Fortet-Mourier metric dl F M,p and dl φ p are diﬀerentmetrics, they generate the same topology, which conﬁrms the statement at the beginning of this12ection, i.e., for the qualitative robustness and the quantitative robustness, the speciﬁc choice ofprobability metrics matters but not the topologies generated by them. To conclude this section,we remark that the subset M to be used in the deﬁnition of qualitative robust analysis will beconﬁned to the set of admissible laws when we adopt the Fortet-Mourier metric for quantitativerobust analysis in the next section. We are now ready to return our discussions to the robustness of statistical estimators of lawinvariant risk measures that are outlined in Section 2.

To position our research properly, we begin by a brief overview of the existing results about thequalitative statistical robustness.

Deﬁnition 4.1 (Qualitative P -Robustness [2, 1]) Let P be a subset of P (IR) and P ∈P . The sequence { b ̺ N } N ∈ N of estimators is said to be qualitatively P -robust at P w.r.t. ( dl , dl ′ ) if for every ǫ > there exist δ > and N ∈ N such that for all Q ∈ P and N ≥ N dl ( P, Q ) ≤ δ = ⇒ dl ′ ( P N ◦ b ̺ − N , Q N ◦ b ̺ − N ) ≤ ǫ. (4.1) If, in addition, { b ̺ N } N ∈ N arises as in (2.2) from a risk functional ̺ , then ̺ is called qualitatively P -robust at P w.r.t. ( dl , dl ′ ) . The deﬁnition above captures two versions of qualitative statistical robustness proposed byCont et al. [2] for i.i.d. observations on IR with dl and dl ′ being L´evy distance and Kr¨atchmeret al. [7] for i.i.d. observations on IR with dl = dl ( φ ) and dl ′ = dl Prok respectively. Since dl Prok istighter than dl L´evy , it means Kr¨atchmer et al. [7] examines the discrepancy of the laws with atighter metric. On the other hand, from the deﬁnition of dl ( φ ) , we can see that it is also tighterthan dl L´evy and allows one to capture the diﬀerence of distributions at the tail, it means therobust analysis in Kr¨atchmer et al. [7] is restricted to a smaller class of probability distributionswhen Q is perturbed from P . This explains why CVaR is robust under the criterion of the latterbut not the former.A key result that Kr¨atchmer et al. [7] establish is the Hampel’s theorem which states theequivalence between qualitative statistical robustness and stability/continuity of a risk functional(with respect to perturbation of the probability distribution) under uniform Glivenko-Cantelli(UGC) property of empirical distributions over a speciﬁed set. Deﬁnition 4.2 ( C -Continuity [7]) Let P ∈ P (IR) and C be a subset of P (IR) . Then ̺ iscalled C -continuous at P w.r.t. ( dl , | · | ) if for every ǫ > , there exists δ > such that for all Q ∈ C dl ( P, Q ) ≤ δ = ⇒ | ̺ ( P ) − ̺ ( Q ) | ≤ ǫ. eﬁnition 4.3 (UGC Property [7]) Let C be a subset of P (IR) . Then we say that themetric space ( C , dl ) has the UGC property if for every ǫ > and δ > , there exists N ∈ N such that for all P ∈ C and N ≥ N P ⊗ N (cid:2) ( ξ , . . . , ξ N ) ∈ IR N : dl ( P, P N ) ≥ δ } (cid:3) ≤ ǫ. The UGC property means that convergence in probability of the empirical probability measureto the true marginal distribution uniformly in C on P (IR). Examples for metrics spaces ( C , dl )having the UGC property can be found in [7, Section 3]. In particular, it is shown that thereexists a subset of the admissible laws induced by the weigthed Kolmogorov metric enjoys theUGC property, see [7, Theorem 3.1]. Theorem 4.1 (Hampel’s Theorem [7])

Let P be a subset of P (IR) and P ∈ P . Assumethat ( P , dl ) has the UGC property and M , emp ⊂ P . Then if the mapping ̺ is M , emp -continuous at P w.r.t. ( dl , | · | ) , the sequence { b ̺ N } N ∈ N is qualitatively P -robust at P w.r.t. ( dl , dl Prok ) . We now move on to discuss our central topic, quantitative statistical robustness for the plug-inestimators of law invariant risk measures. Intuitively speaking, quantitative statistical robust-ness of a risk functional ̺ means that for any two admissible laws P and Q on P (IR), thedistance between the laws of their plug-in estimators ̺ ( P N ) and ̺ ( Q N ) is bounded by the dis-tance between P and Q when the sample size is suﬃciently large. Deﬁnition 4.4 (Quantitative statistical robustness)

Let dl , dl ′ be probability metrics on P (IR) and M ⊂ P dl ′ (IR) denote a subset of admissible laws on IR . A sequence of statisti-cal estimators { b ̺ N } N ∈ N is said to be quantitative statistical robust on M w.r.t. ( dl , dl ′ ) if thereexists a non-decreasing real-valued continuous function h : IR + → IR + with h (0) = 0 such thatfor all P, Q ∈ M and N ∈ N dl ( P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N ) ≤ h ( dl ′ ( P, Q )) < + ∞ . (4.2) If in addition, { b ̺ N } N ∈ N arise as in (2.2) from a risk functional ̺ , then ̺ is called quantitativestatistical robust on M at P w.r.t. ( dl , dl ′ ) . In a particular case when dl = dl K , h ( t ) = Lt and dl ′ = dl F M,p , inequality (4.2) reduces to dl K (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) ≤ L dl F M,p ( P, Q ) < + ∞ . (4.3)In comparison with the qualitative statistical robustness introduced by Kr¨atchmer et al. [1]or Cont et al. [2], the deﬁnition (4.3) here has several advantages. First, we use Kantorovichmetric instead of Prokhorov metric to quantify the discrepancy between P ⊗ N ◦ b ̺ − N and Q ⊗ N ◦ b ̺ − N .This enables us to capture the tail behaviour of the two laws and facilitate us to derive an explicitbound for the diﬀerence. Second, we use the Fortet-Mourier metric to quantify the perturbation14f P , which is more sensitive than the L´evy metric used in [2] and the weighted Kolmogorovmetric in [1, 8, 9, 13] to the variation of the tails. Third, inequality (4.3) gives an error boundfor the discrepancy of the two laws and the bound is valid for all Q in M instead of those in aneighborhood of P .Next, we introduce a deﬁnition on the Lipschitz continuity of a general statistical mappingfrom P (IR) to IR, which strengthens the earlier deﬁnition of C -continuity for a general statisticalfunctional. Deﬁnition 4.5 (Lipschitz continuity)

Let ̺ : P (IR) → IR be a general statistical functionaland M be a subset of P (IR) . ̺ is said to be Lipschitz continuous on M w.r.t. dl if there existsa positive constant L such that | ̺ ( P ) − ̺ ( Q ) | ≤ L dl ( P, Q ) < + ∞ , ∀ P, Q ∈ M . (4.4)There are a few of points to note to the above deﬁnition of Lipschitz continuity:1. The Lipschitz continuity is global instead of local over M . The condition is strong but wewill ﬁnd that many risk functionals are global Lipschitz continuous on some M indeed.2. The magnitude of the continuity depends on the metric dl which measures the distancebetween P and Q . In a speciﬁc case when dl = dl F M,p , (4.4) reduces to | ̺ ( P ) − ̺ ( Q ) | ≤ L Z IR | P ( x ) − Q ( x ) | c p ( x ) dx < + ∞ , ∀ P, Q ∈ M , (4.5)where c p ( x ) = max { , | x | p − } . The exponent p plays an important role in (4.5) because itinteracts with the tails of P ( · ) and Q ( · ). Moreover, if M ⊂ P p (IR), then (4.5) is ﬁnite.We will come back to this later.3. Let P N and Q N be empirical distributions on IR. By plugging P N and Q N into (4.5), weobtain | ̺ ( P N ) − ̺ ( Q N ) | ≤ L Z IR | P N ( x ) − Q N ( x ) | c p ( x ) dx = L N X k =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 ξ i ≤ x k − N N X i =1 b ξ i ≤ x k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z x k +1 x k c p ( x ) dx = L N X k =1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z b ξ jk ξ ik c p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ L N X k =1 N | ξ i k − b ξ j k | max { c p ( ξ i k ) , c p ( b ξ j k ) }≤ LN N X k =1 c p ( ξ k , b ξ k ) | ξ k − b ξ k | , ∀ ξ k , b ξ k ∈ IR , (4.6)where x k is the k -th smallest number among { ξ , . . . , ξ N ; b ξ , . . . , b ξ N } for k = 1 , . . . , N and x N +1 = x N and c p ( ξ, b ξ ) = max { , | ξ | , | b ξ |} p − for all ξ, b ξ ∈ IR. The equality is due15o Fubini’s theorem for discrete case and the last inequality from Lemma A.1 for the non-decreasing sequences { ξ i k max { c p ( ξ i k ) , c p ( b ξ j k ) }} Nk =1 and { b ξ j k max { c p ( ξ i k ) , c p ( b ξ j k ) }} Nk =1 .4. In the case when ̺ is continuous on M , the Lipschitz continuity (4.5) is equivalent tothe Lipschitz continuity (4.6) on the set of all empirical distributions M , emp (see the ﬁrstinequality of equation (4.6)) because M , emp is dense in P (IR). Example 4.1 ( p -th moment functional) For p ≥ , we consider the p -th moment functional T ( p ) on M p = P p (IR) as deﬁned by: T ( p ) ( P ) := Z + ∞−∞ x p dP ( x ) < + ∞ , ∀ P ∈ M p . Analogous to Example B.1, we have T ( p ) ( P ) = − Z −∞ P ( x ) px p − dx + Z + ∞ (1 − P ( x )) px p − dx. (4.7) Thus, for any

P, Q ∈ M p , | T ( p ) ( P ) − T ( p ) ( Q ) | = (cid:12)(cid:12)(cid:12)(cid:12)Z + ∞−∞ ( P ( x ) − Q ( x )) px p − dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ p Z + ∞−∞ | P ( x ) − Q ( x ) || x | p − dx ≤ p Z + ∞−∞ | P ( x ) − Q ( x ) | c p ( x ) dx < + ∞ , (4.8) where c p ( x ) = max { , | x | p − } . From (4.5), we can see that the p -th moment functional T ( p ) isLipschitz continuous w.r.t. dl F M,p on M p . Lemma 4.1

Let ξ := ( ξ , · · · , ξ N ) ∈ IR N and Ψ be a set of functions from IR N to IR , i.e., Ψ := ( ψ : IR N → IR : | ψ (˜ ξ ) − ψ ( b ξ ) | ≤ N N X k =1 c p ( ˜ ξ k , b ξ k ) | ˜ ξ k − b ξ k | , ∀ ξ , ˜ ξ ∈ IR N ) , (4.9) where c p ( ξ, ˜ ξ ) := max { , | ξ | , | ˜ ξ |} p − for all ξ, ˜ ξ ∈ IR and p ≥ . Then dl Ψ ( P ⊗ N , Q ⊗ N ) ≤ dl F M,p ( P, Q ) < + ∞ , ∀ P, Q ∈ P p (IR) , (4.10) where dl Ψ is deﬁned by (3.3). Before presenting a proof, it might be helpful for us to explain why we consider a speciﬁcset of functions Ψ. For ﬁxed N ∈ N , let M N , emp denote the set of all empirical laws P N overIR, then M , emp = S N ∈ N M N , emp . Then Ψ may be regarded as a set of functions derived froma class of Lipschitz continuous functional on M N , emp with L = 1 and dl = dl F M,p (by writing T ( P N ) as a function of samples). Lemma 4.1 says that for any N ∈ N , the discrepancy between P ⊗ N and Q ⊗ N under the metric dl Ψ can be bounded by dl F M,p ( P, Q ). Proof.

Let ξ − j := { ξ , · · · , ξ j − , ξ j +1 , · · · , ξ N } , ~ξ j := { ξ , · · · , ξ j } and ~ξ − j := { ξ j +1 , · · · , ξ N } .For any P , · · · , P N ∈ P (IR) and any j ∈ { , · · · , N } , denote P − j ( dξ − j ) := P ( dξ ) · · · P j − ( dξ j − ) P j +1 ( dξ j +1 ) · · · P N ( dξ N )16nd h ξ − j ( ξ j ) := Z IR ( N − ψ ( ξ − j , ξ j ) P − j ( dξ − j ) . Then | h ξ − j ( ˜ ξ j ) − h ξ − j ( b ξ j ) | ≤ Z IR ( N − (cid:12)(cid:12)(cid:12) ψ ( ξ − j , ˜ ξ j ) − ψ ( ξ − j , b ξ j ) (cid:12)(cid:12)(cid:12) P − j ( dξ − j ) ≤ Z IR ( N − N c p ( ˜ ξ j , b ξ j ) | ˜ ξ j − b ξ j | P − j ( dξ − j ) ≤ N c p ( ˜ ξ j , b ξ j ) | ˜ ξ j − b ξ j | . Let H denote the set of functions h ξ − j ( ξ j ) generated by ψ ∈ Ψ. By the deﬁnition of dl Ψ and the p -th order Forter-Mourier metric, dl Ψ ( P − j × ˜ P j , P − j × b P j ) = sup ψ ∈ Ψ (cid:12)(cid:12)(cid:12)(cid:12)Z IR Z IR ( N − ψ ( ξ − j , ξ j ) P − j ( dξ − j ) ˜ P j ( dξ j ) − Z IR Z IR ( N − ψ ( ξ − j , ξ j ) P − j ( dξ − j ) b P j ( dξ j ) (cid:12)(cid:12)(cid:12)(cid:12) = sup h ξ − j ∈H (cid:12)(cid:12)(cid:12)(cid:12)Z IR h ξ − j ( ξ j ) ˜ P j ( dξ j ) − Z IR h ξ − j ( ξ j ) b P j ( dξ j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ N dl F M,p ( ˜ P j , b P j ) , (4.11)where the inequality is due to N h ξ − j ( ξ j ) ∈ F p (IR) and the deﬁnition of dl F M,p ( P, Q ). Finally,by the triangle inequality of the pseudo-metric, we have dl Ψ (cid:0) P ⊗ N , Q ⊗ N (cid:1) ≤ dl Ψ (cid:16) P ⊗ N , P ⊗ ( N − × Q (cid:17) + dl Ψ (cid:16) P ⊗ ( N − × Q, P ⊗ ( N − × Q ⊗ (cid:17) + · · · + dl Ψ (cid:16) P × Q ⊗ ( N − , Q ⊗ N (cid:17) ≤ N dl F M,p ( P, Q ) × N = dl F M,p ( P, Q ) . The proof is complete.With the intermediate technical result, we are now ready to present our main result ofquantitative statistical robustness for the plug-in estimator of a general risk functional.

Theorem 4.2

Let ̺ : P (IR) → IR be a general statistical functional and M be a subset of P p (IR) with p ≥ . Assume, for ﬁxed N ∈ N , there exists a positive constant L such that | ̺ ( P N ) − ̺ ( Q N ) | ≤ LN N X k =1 c p ( ξ k , b ξ k ) | ξ k − b ξ k | , ∀ ξ k , b ξ k ∈ IR , (4.12) where P N and Q N are given by (2.3) and (2.4) respectively. Then b ̺ N is quantitatively robuston M w.r.t. ( dl K , dl F M,p ) , i.e., dl K (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) ≤ L dl F M,p ( P, Q ) < + ∞ , ∀ P, Q ∈ M . (4.13)17 f (4.12) holds for all N ∈ N , then the whole sequence of the plug-in estimators { b ̺ N } N ∈ N isquantitatively robust on M , i.e., (4.13) holds for all N ∈ N . Moreover, in the case when p = 1 ,(4.13) reduces to dl K (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) ≤ L dl K ( P, Q ) < + ∞ , ∀ P, Q ∈ M . (4.14) Proof.

Since the underlying probability space is atomless, then for any N ∈ N , by deﬁnition dl K (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) = sup ψ ∈F (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR ψ ( t ) P ⊗ N ◦ b ̺ − N ( dt ) − Z IR ψ ( t ) Q ⊗ N ◦ b ̺ − N ( dt ) (cid:12)(cid:12)(cid:12)(cid:12) = sup ψ ∈F (IR) (cid:12)(cid:12)(cid:12)(cid:12)Z IR N ψ ( ̺ ( ~ξ N )) P ⊗ N ( d~ξ N ) − Z IR N ψ ( ̺ ( ~ξ N )) Q ⊗ N ( d~ξ N ) (cid:12)(cid:12)(cid:12)(cid:12) , (4.15)where we write ~ξ N for ( ξ , · · · , ξ N ) and ̺ ( ~ξ N ) for b ̺ N to indicate its dependence on ξ , · · · , ξ N .For any ψ ∈ F (IR), (4.12) ensures that | ψ ( ̺ ( ˜ ~ξ )) − ψ ( ̺ ( b ~ξ )) | ≤ | ̺ ( ˜ ~ξ ) − ̺ ( b ~ξ ) | ≤ LN N X k =1 c p ( ˜ ξ k , b ξ k ) | ˜ ξ k − b ξ k | , ∀ ˜ ξ, b ξ ∈ IR , which means that ψ ( ̺ ( · )) is locally Lipschitz continuous in ~ξ N , i.e., ψ ( ̺ ( · )) ∈ F p ((IR N ) from(3.4). Since P, Q ∈ M ⊂ P p (IR) ⊂ P K (IR) (see Example 3.2(i) and Proposition 3.1(i)), then(4.15) is ﬁnite. The rest follows from Lemma 4.1 by setting ψ ( ξ , · · · , ξ N ) = ψ ( ̺ ( ξ , · · · , ξ N )).From Example 3.1, we have dl Prok ( P, Q ) ≤ p dl K ( P, Q ) for all

P, Q ∈ P (IR), then we havethe following corollary. Corollary 4.1

Let ̺ : P (IR) → IR be a general statistical functional. Assume that ̺ is Lipschitzcontinuous w.r.t. dl F M,p ( p ≥ ) on M ⊂ P p (IR) for the constant L . Then the plug-in estimatorsequence { b ̺ N } N ∈ N is quantitatively robust on M w.r.t. ( dl Prok , dl F M,p ) , i.e., dl Prok (cid:0) P ⊗ N ◦ b ̺ − N , Q ⊗ N ◦ b ̺ − N (cid:1) ≤ q L dl F M,p ( P, Q ) < + ∞ , ∀ P, Q ∈ M for all N ∈ N . Next, we take a step further to consider the index of quantitative robustness for a generalstatistical functional.

Deﬁnition 4.6 (Index of quantitative robustness)

Let ̺ : P (IR) → IR be a general sta-tistical functional. If ̺ is Lipschitz continuous w.r.t. dl F M,p on P p (IR) for the constant L forsome p ≥ , then we can deﬁne an index of quantitative robustness of a statistical functional ̺ as iqr( ̺ ) := (inf { p ∈ [1 , + ∞ ) : ̺ is Lipschitz continuous w.r.t. dl F M,p on P p (IR) } ) − . (4.16)18his index is a quantitative measurement for the degree of robustness of a statistical functional.A larger index reﬂects a higher degree of robustness. For a general statistical functional ̺ , (4.5)may hold for uncountable many p , see e.g., the 2-th moment functional T (2) satisfying (4.5) forany p ≥ P p (IR) = M p . From Deﬁnition 4.6, we conclude that the p -th moment functional T ( p ) has the index iqr( T ( p ) ) = p . Deﬁnition 4.6 coincides with the index of qualitative robustnessproposed by Kr¨atschmer et al. [7] when ̺ is Lipschitz continuous w.r.t. dl F M,p on P p (IR). Themain advantage of Deﬁnition 4.6 is that it is easy to calculate and we will illustrate this in thenext section. As we discussed in Proposition 2.1, law invariant risk measure of a random variable can berepresented as a composition of a risk functional and law of the random variable. In practice,risk of a random variable is often calculated with empirical data, this is because either thetrue probability distribution is unknown or it might be prohibitively expensive to calculate therisk of a random variable with the true probability distribution. This raises a question as towhether the estimated risk measure based on empirical data is reliable or not. In this section,we apply the quantitative robustness results established in Theorem 4.2 to some well-known riskmeasures. The next proposition synthesizes Proposition 2.1 and Theorem 4.2.

Proposition 5.1

Let ρ ( X ) be a tail-dependent law invariant convex risk measure with repre-sentation (2.1), let P N and Q N be empirical probability measures deﬁned as in (2.4). Assumethat there exists a positive number p ≥ such that | ̺ ( P N ) − ̺ ( Q N ) | ≤ LN N X k =1 c p ( ξ k , b ξ k ) | ξ k − b ξ k | , ∀ ~ξ, b ~ξ ∈ IR N . (5.1) Then for any N ∈ N and any P, Q ∈ M p dl K (cid:0) P ⊗ N ◦ ̺ ( P N ) − , Q ⊗ N ◦ ̺ ( Q N ) − (cid:1) ≤ L dl F M,p ( P, Q ) < + ∞ . (5.2)In what follows, we verify condition (5.1) for some well-known risk measures and hence showthat they satisfy the proposed quantitative statistical robustness (5.2). To make the notationeasily, we introduce the law invariant risk measure on the space of probability distributions. Example 5.1

The expectation of G ∈ P (IR) given by E ( G ) := R IR ξdG ( ξ ) satisﬁes | E ( P N ) − E ( Q N ) | = (cid:12)(cid:12)(cid:12)(cid:12)Z IR ξd ( P N − Q N )( ξ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ N N X i =1 | ξ i − b ξ i | . Let T N := E ( b G N ) , where b G N is the empirical distribution of G . Then for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ dl K ( P, Q ) < + ∞ , (5.3) and the index of quantitative robustness iqr( E ) = 1 . xample 5.2 Consider the conditional value-at-risk of a probability distribution G ∈ P (IR) atlevel τ ∈ (0 , , which is deﬁned by CVaR τ ( G ) := inf (cid:26) r + 11 − τ Z IR max { , ξ − r } dG ( ξ ) , ∀ r ∈ IR (cid:27) . Then | CVaR p ( P N ) − CVaR p ( Q N ) | ≤ − τ sup r ∈ IR (cid:12)(cid:12)(cid:12)(cid:12)Z IR max { , ξ − r } d ( P N − Q N )( ξ ) (cid:12)(cid:12)(cid:12)(cid:12) = 11 − τ sup r ∈ IR N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 max { , ξ i − r } − max { , b ξ i − r } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ − τ × N N X i =1 | ξ i − b ξ i | , the last inequality is due to the fact that | max { , x } − max { , y }| ≤ | x − y | holds for all x, y ∈ IR .Let T N := CVaR τ ( b G N ) , where b G N is the empirical distribution of G . Then for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ − τ dl K ( P, Q ) < + ∞ , (5.4) and the index of quantitative robustness iqr(CVaR τ ) = 1 for τ ∈ (0 , . Example 5.3

The upper semi-deviation sd + ( G ) of a measure G ∈ P (IR) , which is deﬁned by sd + ( G ) := Z IR max (cid:26) , ξ − Z IR udG ( u ) (cid:27) dG ( ξ ) , satisﬁes | sd + ( P N ) − sd + ( Q N ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X j =1 max ( , ξ j − N N X i =1 ξ i ) − N N X j =1 max ( , b ξ j − N N X i =1 b ξ i )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N N X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ( , ξ j − N N X i =1 ξ i ) − max ( , b ξ j − N N X i =1 b ξ i )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N N X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ξ j − N N X i =1 ξ i ! − b ξ j − N N X i =1 b ξ i !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N X j =1 (cid:12)(cid:12)(cid:12) ξ j − b ξ j (cid:12)(cid:12)(cid:12) + 1 N N X i =1 | ξ i − b ξ i | ! = 2 N N X i =1 | ξ i − b ξ i | . Let T N := sd + ( b G N ) , where b G N is the empirical distribution of G . Then for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ dl K ( P, Q ) < + ∞ , (5.5) and the index of quantitative robustness iqr(sd + ) = 1 . xample 5.4 The

Optimized Certainty Equivalent (OCE) [31] of G ∈ P (IR) is given by S u ( G ) := sup η ∈ IR (cid:26) η + Z IR u ( ξ − η ) dG ( ξ ) (cid:27) , where u : IR → [ −∞ , ∞ ) is a proper concave and non-decreasing utility function satisfying thenormalized property: u (0) = 0 and ∈ ∂u (0) , where ∂u ( · ) denotes the subdiﬀerential map of u .By the essential of [31, Proposition 2.1], we have S u ( P N ) = sup η ∈ IR ( η + 1 N N X i =1 u ( ξ i − η ) ) = sup η ∈ [ ξ min ,ξ max ] ( η + 1 N N X i =1 u ( ξ i − η ) ) , where ξ min = min { ξ , . . . , ξ N ; b ξ , . . . , b ξ N } and ξ max = max { ξ , . . . , ξ N ; b ξ , . . . , b ξ N } . Let ρ ( G ) := − S u ( G ) . Then ρ ( · ) is a convex risk measure [31] and | ρ ( P N ) − ρ ( Q N ) | ≤ sup η ∈ [ ξ min ,ξ max ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) η + Z IR u ( ξ − η ) dP N ( ξ ) (cid:19) − (cid:18) η + Z IR u ( ξ − η ) dQ N ( ξ ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = sup η ∈ [ ξ min ,ξ max ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 u ( ξ i − η ) − N N X i =1 u ( b ξ i − η ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup η ∈ [ ξ min ,ξ max ] N N X i =1 (cid:12)(cid:12)(cid:12) u ( ξ i − η ) − u ( b ξ i − η ) (cid:12)(cid:12)(cid:12) ≤ N N X i =1 u ′− ( ξ min ) | ξ i − b ξ i | , where u ′− ( t ) denotes the left derivative of u at t and the last inequality is due to the fact that u is non-decreasing and concave, subsequently, u ′− ( t ) is non-increasing.Let T N := − S u ( b G N ) , where b G N is the empirical distribution of G . We consider two inter-esting cases.One is that sup η ∈ IR u ′− ( η ) < + ∞ , in which case dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ sup η ∈ IR u ′− ( η ) dl K ( P, Q ) < + ∞ , (5.6) for any N ∈ N and any P, Q ∈ M and the index of quantitative robustness for this case is .The other is that there exists some positive number p > and positive constant L such that u ′− ( ξ min ) ≤ Lc p ( ξ i , b ξ i ) , where c p ( ξ i , b ξ i ) = max { , | ξ i | , | b ξ i |} p − . In that case, we have dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ L dl F M,p ( P, Q ) < + ∞ , (5.7) and the index of quantitative robustness for this case is p .To see how (5.6) and (5.7) could possibly be satisﬁed, we consider two speciﬁc utility func-tions: piecewise linear utility function and quadratic utility function, both of which are extractedfrom [31]. a) Piecewise linear utility function with u ( t ) := γ [ t ] + + γ [ − t ] + , where ≤ γ < < γ and [ z ] + = max { , z } . A simple calculation yields | ρ ( P N ) − ρ ( Q N ) | ≤ γ N N X i =1 | ξ i − b ξ i | . Thus for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ γ dl K ( P, Q ) < + ∞ , (5.8) and and the index of quantitative robustness iqr( − S u ) = 1 .(b) Quadratic utility with u ( t ) := ( t − t ) ( −∞ , ( t ) + [1 , + ∞ ) ( t ) . It is easy to observethat the function is locally Lipschitz continuous over [ ξ min , ξ max ] with modulus being bounded by | − ξ min | . Thus | ρ ( P N ) − ρ ( Q N ) | ≤ sup η ∈ [ ξ min ,ξ max ] N N X i =1 (cid:12)(cid:12)(cid:12) u ( ξ i − η ) − u ( b ξ i − η ) (cid:12)(cid:12)(cid:12) ≤ N N X i =1 | − ξ min || ξ i − b ξ i | . Moreover, if ξ min ≤ − , then | − ξ min | ≤ | ξ min | . Subsequently, | ρ ( P N ) − ρ ( Q N ) | ≤ N N X i =1 c ( ξ i , b ξ i ) | ξ i − b ξ i | , where c ( ξ i , b ξ i ) = max { , | ξ i | , | b ξ i |} . Thus for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ dl F M, ( P, Q ) (5.9) provided that ξ min < − and the index of quantitative robustness iqr( − S u ) = . Example 5.5

Suppose that l : IR → IR is an increasing convex loss function which is notidentically constant. Let x be an interior point in the range of l . The Shortfall Risk Measure [18] of G ∈ P (IR) is deﬁned by ρ l ( G ) := inf (cid:26) m ∈ IR : Z IR l ( ξ − m ) dG ( ξ ) ≤ x (cid:27) . (5.10) Following a similar analysis to Guo and Xu [32], we can recast the formulation above as ρ l ( G ) = inf m ∈ IR sup λ ≥ (cid:26) m + λ (cid:18)Z IR l ( ξ − m ) dG ( ξ ) − x (cid:19)(cid:27) . (5.11) Swapping the inf and sup operations, we can obtain the Lagrange dual of the problem. Moreover,if we assume that the inequality constraint in (5.10) satisﬁes the well-known Slater condition,i.e., there exists m such that R IR l ( ξ − m ) dG ( ξ ) − x < , then the Lagrange multipliers of(5.10) is bounded and the strong duality holds. Consequently, we can rewrite (5.11) as ρ l ( G ) = inf m ∈ IR sup λ ∈ [ a,b ] (cid:26) m + λ (cid:18)Z IR l ( ξ − m ) dG ( ξ ) − x (cid:19)(cid:27) , (5.12)22 here a, b are some positive numbers. By the essential of [31, Proposition 2.1], we have ρ l ( P N ) = sup λ ∈ [ a,b ] inf m ∈ IR ( m + λ N N X i =1 l ( ξ i − η ) − x !) = sup λ ∈ [ a,b ] inf m ∈ [ ξ min ,ξ max ] ( m + λ N N X i =1 l ( ξ i − m ) − x !) , where ξ min = min { ξ , . . . , ξ N ; b ξ , . . . , b ξ N } and ξ max = max { ξ , . . . , ξ N ; b ξ , . . . , b ξ N } . Subsequently, | ρ l ( P N ) − ρ l ( Q N ) | ≤ b sup m ∈ [ ξ min ,ξ max ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 l ( ξ i − m ) − N N X i =1 l ( b ξ i − m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ b sup m ∈ [ ξ min ,ξ max ] N N X i =1 (cid:12)(cid:12)(cid:12) l ( ξ i − m ) − l ( b ξ i − m ) (cid:12)(cid:12)(cid:12) ≤ b sup m ∈ [ ξ min ,ξ max ] N N X i =1 [ l ′ + ( ξ i − m ) ∨ l ′ + ( b ξ i − m )] | ξ i − b ξ i |≤ bN N X i =1 [ l ′ + ( ξ i − ξ min ) ∨ l ′ + ( b ξ i − ξ min )] | ξ i − b ξ i |≤ bN N X i =1 sup m ∈ IR l ′ + ( m ) | ξ i − b ξ i | , where l ′ + ( t ) denote the right derivative of l at t and the last three inequalities are due to the fact l is non-decreasing convex, subsequently, l ′ + ( t ) is non-decreasing.Let T N := ρ l ( b G N ) , where b G N is the empirical distribution of G . If sup m ∈ IR l ′ + ( m ) < + ∞ ,then for any N ∈ N and any P, Q ∈ M , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ sup m ∈ IR l ′ + ( η ) dl K ( P, Q ) < + ∞ . (5.13) If there exists some positive number p > and positive constant L such that l ′ + ( ξ i − ξ min ) ∨ l ′ + ( b ξ i − ξ min ) ≤ Lc p ( ξ i , b ξ i ) , where c p ( ξ i , b ξ i ) = max { , | ξ i | , | b ξ i |} p − , then dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ L dl F M,p ( P, Q ) < + ∞ . (5.14) In what follows, we illustrate the above two inequalities with two speciﬁc loss functions: depositinsurance loss function [33] and p -th power loss function [18].(a) Deposit insurance loss function, l ( x ) = [ x ] + , where [ x ] + = max { x, } . Then sup m ∈ IR l ′ + ( m ) < + ∞ . Thus, for any N ∈ N and any P, Q ∈ M φ , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ sup m ∈ IR l ′ + ( η ) dl K ( P, Q ) < + ∞ , (5.15) and the index of quantitative robustness is 1.(b) For x > , we consider the p -th power loss function, l ( x ) = ( p x p , if x ≥ , otherwise , here p > . We have l ′ + ( x ) = x p − for x ≥ and l ′ + ( x ) = 0 for x < . Then, if ξ min ≥ ,then ≤ ξ i − ξ min ≤ | ξ i | and subsequently l ′ + ( ξ i − ξ min ) ∨ l ′ + ( b ξ i − ξ min ) ≤ c p ( ξ i , b ξ i ) . Thus for any N ∈ N and any P, Q ∈ M p , dl K (cid:0) P ⊗ N ◦ T − N , Q ⊗ N ◦ T − N (cid:1) ≤ dl F M,p ( P, Q ) < + ∞ (5.16) provided that ξ min ≥ and the index of quantitative robustness is p . In all of the above examples, the risk measures can either be represented explicitly in theform of R IR f ( x ) dP ( x ) (such as Expectation) or be obtained from solving an optimization problemwhere the underlying functions are represented in the expected utility form (CVaR, CertaintyEquivalent and Shortfall risk measure), this is because the utility (disutility) functions areassumed to be concave (convex) and hence locally Lipschitz continuous. When growth of theLipschitz modulus is controlled by c p ( ξ, ξ ′ ), these risk measures satisfy inequality (4.5) as we haveshown. This may not work for the spectral risk measures [34] with unbounded risk spectrumbecause the latter distort the probability distribution P ( x ). However, when the risk spectrumis bounded (such as CVaR which is a special case of spectral risk measure), we can still manageinequality (4.5). This explains why we haven’t included spectral risk measures in the examples. References [1] V. Kr¨atschmer, A. Schied, and H. Z¨ahle, “Comparative and qualitative robustness for law-invariant risk measures,”

Finance and Stochastics , vol. 18, no. 2, pp. 271–295, 2014.[2] R. Cont, R. Deguest, and G. Scandolo, “Robustness and sensitivity analysis of risk mea-surement procedures,”

Quantitative ﬁnance , vol. 10, no. 6, pp. 593–606, 2010.[3] H. Z¨ahle, “Rates of almost sure convergence of plug-in estimates for distortion risk mea-sures,”

Metrika , vol. 74, no. 2, pp. 267–285, 2011.[4] D. Belomestny and V. Kr¨atschmer, “Central limit theorems for law-invariant coherent riskmeasures,”

Journal of Applied Probability , vol. 49, no. 1, pp. 1–21, 2012.[5] E. Beutner and H. Z¨ahle, “A modiﬁed functional delta method and its application to theestimation of risk functionals,”

Journal of Multivariate Analysis , vol. 101, no. 10, pp. 2452–2463, 2010.[6] F. R. Hampel, “A general qualitative deﬁnition of robustness,”

The Annals of MathematicalStatistics , pp. 1887–1896, 1971.[7] V. Kr¨atschmer, A. Schied, and H. Z¨ahle, “Qualitative and inﬁnitesimal robustness of tail-dependent statistical functionals,”

Journal of Multivariate Analysis , vol. 103, no. 1, pp. 35–47, 2012.[8] H. Z¨ahle, “Qualitative robustness of von mises statistics based on strongly mixing data,”

Statistical Papers , vol. 55, no. 1, pp. 157–167, 2014.249] H. Z¨ahle et al. , “Qualitative robustness of statistical functionals under strong mixing,”

Bernoulli , vol. 21, no. 3, pp. 1412–1434, 2015.[10] G. Boente, R. Fraiman, V. J. Yohai, et al. , “Qualitative robustness for stochastic processes,”

The Annals of Statistics , vol. 15, no. 3, pp. 1293–1312, 1987.[11] K. Strohriegl and R. Hable, “Qualitative robustness of estimators on stochastic processes,”

Metrika , vol. 79, no. 8, pp. 895–917, 2016.[12] P. J. Huber and E. M. Ronchetti,

Robust statistics . Springer, 2011.[13] H. Z¨ahle, “A deﬁnition of qualitative robustness for general point estimators, and exam-ples,”

Journal of Multivariate Analysis , vol. 143, pp. 12–31, 2016.[14] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel,

Robust statistics: theapproach based on inﬂuence functions , vol. 196. John Wiley & Sons, 2011.[15] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibi´an-Barrera,

Robust statistics:theory and methods (with R) . John Wiley & Sons, 2019.[16] S. Guo and H. Xu, “Statistical robustness in utility preference robust optimization models,”

Submitted to Mathematical Programming , 2020.[17] D. Filipovi´c and G. Svindland, “The canonical model space for law-invariant convex riskmeasures is l1,”

Mathematical Finance: An International Journal of Mathematics, Statisticsand Financial Economics , vol. 22, no. 3, pp. 585–589, 2012.[18] H. F¨ollmer and A. Schied, “Convex measures of risk and trading constraints,”

Finance andstochastics , vol. 6, no. 4, pp. 429–447, 2002.[19] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, “Coherent measures of risk,”

Mathe-matical ﬁnance , vol. 9, no. 3, pp. 203–228, 1999.[20] H. F¨ollmer and S. Weber, “The axiomatic approach to risk measures for capital determi-nation,”

Annual Review of Financial Economics , vol. 7, pp. 301–337, 2015.[21] E. Delage, D. Kuhn, and W. Wiesemann, “dice-sion–making under uncertainty: When cana random decision reduce risk?,”

Management Science , vol. 65, no. 7, pp. 3282–3301, 2019.[22] M. Frittelli, M. Maggis, and I. Peri, “Risk measures on and value at risk with probabil-ity/loss function,”

Mathematical Finance , vol. 24, no. 3, pp. 442–463, 2014.[23] D. Dentcheva and A. Ruszczy´nski, “Risk preferences on the space of quantile functions,”

Mathematical Programming , vol. 148, no. 1-2, pp. 181–200, 2014.[24] W. B. Haskell, W. Huang, and H. Xu, “Preference elicitation and robust optimization withmulti-attribute quasi-concave choice functions,” arXiv preprint arXiv:1805.06632 , 2018.[25] A. L. Gibbs and F. E. Su, “On choosing and bounding probability metrics,”

Internationalstatistical review , vol. 70, no. 3, pp. 419–435, 2002.2526] I. Mizera et al. , “Qualitative robustness and weak continuity: the extreme unction,”

Non-parametrics and robustness in modern statistical inference and time series analysis: aFestschrift in honor of Professor Jana Jureckov´a , vol. 1, p. 169, 2010.[27] W. R¨omisch, “Stability of stochastic programming problems,”

Handbooks in operationsresearch and management science , vol. 10, pp. 483–554, 2003.[28] G. C. Pﬂug and A. Pichler, “Approximations for probability distributions and stochasticoptimization problems,” in

Stochastic optimization methods in ﬁnance and energy , pp. 343–387, Springer, 2011.[29] S. T. Rachev,

Probability metrics and the stability of stochastic models , vol. 269. John Wiley& Son Ltd, 1991.[30] P. Mattila,

Geometry of sets and measures in Euclidean spaces: fractals and rectiﬁability .No. 44, Cambridge university press, 1999.[31] A. Ben-Tal and M. Teboulle, “An old-new concept of convex risk measures: The optimizedcertainty equivalent,”

Mathematical Finance , vol. 17, no. 3, pp. 449–476, 2007.[32] S. Guo, H. Xu, and L. Zhang, “Convergence analysis for mathematical programs withdistributionally robust chance constraint,”

SIAM Journal on optimization , vol. 27, no. 2,pp. 784–816, 2017.[33] C. Chen, G. Iyengar, and C. C. Moallemi, “An axiomatic approach to systemic risk,”

Management Science , vol. 59, no. 6, pp. 1373–1388, 2013.[34] C. Acerbi, “Spectral measures of risk: A coherent representation of subjective risk aversion,”

Journal of Banking & Finance , vol. 26, no. 7, pp. 1505–1518, 2002.[35] S. Kusuoka, “On law invariant coherent risk measures,” in

Advances in mathematical eco-nomics , pp. 83–95, Springer, 2001.

Appendix A

Lemma A.1

Let { a i } Ni =1 and { b i } Ni =1 be two non-decreasing sequences. Then for any permuta-tion { k , k , . . . , k N } of { , , . . . , N } , we have N X i =1 | a i − b i | ≤ N X i =1 | a i − b k i | . Proof.

The result is perhaps well known. We include a proof as we cannot ﬁnd a reference. Wedo so by induction.For N = 1, the statement is trivial and for N = 2, | a − b | + | a − b | ≤ | a − b | + | a − b | for any a ≤ a and b ≤ b . Assume that the conclusion holds for N ≤ n . Then for N = n + 1, we have for any non-decreasing sequences { a i } n +1 i =1 and { b i } n +1 i =1 and any permutation of26 k , . . . , k n +1 } of { , . . . , n + 1 } , there exists a j ∈ { , . . . , n + 1 } such that b k j = b n +1 . If j = n + 1, then from induction hypothesis for N = n , we have n +1 X i =1 | a i − b i | = n X i =1 | a i − b i | + | a n +1 − b k n +1 | ≤ n +1 X i =1 | a i − b k i | . If j < n + 1, then we have n +1 X i =1 | a i − b k i | = j − X i =1 | a i − b k i | + n X i = j +1 | a i − b k i | + | a j − b k j | + | a n +1 − b k n +1 |≥ j − X i =1 | a i − b k i | + n X i = j +1 | a i − b k i | + | a j − b k n +1 | + | a n +1 − b k j | = j − X i =1 | a i − b k i | + n X i = j +1 | a i − b k i | + | a j − b k n +1 | + | a n +1 − b n +1 |≥ n X i =1 | a i − b i | + | a n +1 − b n +1 | = n +1 X i =1 | a i − b i | , where the ﬁrst inequality is from induction hypothesis for N = 2 to the non-decreasing sequences { a j , a n +1 } and { b k n +1 , b k j } and the second inequality is due to induction hypothesis for N = n to the non-decreasing sequences { a , . . . , a n } and { b , . . . , b n } . Proposition A.1

Let { a i } Ni =1 be a sequence of numbers and { b i } Ni =1 be a sequence of non-negative numbers. If a i ≤ a i ≤ · · · ≤ a i N , b i ≤ b i ≤ · · · ≤ b i N , then N X k =1 a k b k ≤ N X k =1 a i k b i k . See e.g. [35, Proposition 12].

Appendix B

Example B.1

In this example, we show that both inclusions in (3.12) are strict. We ﬁrst showthat M φ = P ( φ ) (IR) , i.e., there exists a P ∈ P ( φ ) (IR) such that P / ∈ M φ . Let φ be a unbounded u -shaped function. Then by the continuity of φ , there exist a < and b > with φ ( a ) = 2 = φ ( b ) .Let P ( x ) =  φ ( x ) , for x ≤ a, , for a ≤ x ≤ b − φ ( x ) , for x ≥ b. , Since φ ( x ) ≥ for all x outside [ a, b ] , then P ( x ) is well-deﬁned on IR . By the monotonicity andunboundedness of φ , we have P ∈ P (IR) . Moreover, since sup x ≤ | P ( x ) φ ( x ) | + sup x> | (1 − P ( x )) φ ( x ) | = 2 , hen P ∈ P ( φ ) (IR) . However, by change of variables in integration, we have Z IR φ ( x ) dP ( x ) = Z a −∞ φ ( x ) d (cid:18) φ ( x ) (cid:19) + Z + ∞ b φ ( x ) d (cid:18) − φ ( x ) (cid:19) = Z t dt + Z − t dt = 2 Z t dt = + ∞ , which means P / ∈ M φ .Now we show that P ( φ ) (IR) = T ǫ> M φ − ǫ , i.e., there exists a P ∈ T ǫ> M φ − ǫ such that P / ∈ P ( φ ) (IR) . Let φ be an unbounded u -shaped function. Then there exists an unbounded u -shaped function ψ such that lim | x |→ + ∞ ψ ( x ) /φ ( x ) = 0 . More precisely, for any ǫ ∈ (0 , , thereexists an unbounded u -shaped function ψ such that lim | x |→ + ∞ ψ ( x ) /φ ( x ) − ǫ = 0 . (B.1) We construct such ψ as follows: since φ is an unbounded u -shape function, then there exist a < and b > with φ ( a ) = e = φ ( b ) . Let ψ ( x ) =  ln ( φ ( x )) , for x ≤ a, , for a ≤ x ≤ b, ln ( φ ( x )) , for x ≥ b. Then ψ is an unbounded u -shaped function and satisﬁes (B.1). Let P ( x ) =  ψ ( x ) , for x ≤ a, , for a ≤ x ≤ b − ψ ( x ) , for x ≥ b. , Since ψ ( x ) ≥ for all x , then P ( x ) is well-deﬁned on IR . By the monotonicity and unbounded-ness of ψ , we have P ∈ P (IR) .For ﬁxed ǫ ∈ (0 , , by change of variables in integration, we have Z IR φ ( x ) − ǫ dP ( x ) = Z a −∞ φ ( x ) − ǫ d (cid:18) ψ ( x ) (cid:19) + Z + ∞ b φ ( x ) − ǫ d (cid:18) − ψ ( x ) (cid:19) = Z a −∞ φ ( x ) − ǫ d (cid:18) φ ( x ) (cid:19) + Z + ∞ b φ ( x ) − ǫ d (cid:18) − φ ( x ) (cid:19) = Z e − ǫt dt + Z e − ǫ − t dt< + ∞ . Since for ǫ ≥ , φ − ǫ is bounded on IR , then M φ − ǫ = P (IR) . Thus, P ∈ T ǫ> M φ − ǫ . However, sup x ≤ | P ( x ) φ ( x ) | + sup x> | (1 − P ( x )) φ ( x ) | ≥ sup x ≤ a (cid:12)(cid:12)(cid:12)(cid:12) φ ( x )ln ( φ ( x )) (cid:12)(cid:12)(cid:12)(cid:12) + sup x ≥ b (cid:12)(cid:12)(cid:12)(cid:12) φ ( x )ln ( φ ( x )) (cid:12)(cid:12)(cid:12)(cid:12) = + ∞ , which means P / ∈ P ( φ ) (IR) ..