[PDF] Optimizing distortion riskmetrics with distributional uncertainty

Abstract

Optimization of distortion riskmetrics with distributional uncertainty has wide applications in finance and operations research. Distortion riskmetrics include many commonly applied risk measures and deviation measures, which are not necessarily monotone or convex. One of our central findings is a unifying result that allows us to convert an optimization of a non-convex distortion riskmetric with distributional uncertainty to a convex one, leading to great tractability. The key to the unifying equivalence result is the novel notion of closedness under concentration of sets of distributions. Our results include many special cases that are well studied in the optimization literature, including but not limited to optimizing probabilities, Value-at-Risk, Expected Shortfall, and Yaari's dual utility under various forms of distributional uncertainty. We illustrate our theoretical results via applications to portfolio optimization, optimization under moment constraints, and preference robust optimization.

Full PDF

aa r X i v : . [ m a t h . O C ] N ov Optimizing distortion riskmetrics with distributional uncertainty

Silvana M. Pesenti ∗ Qiuqi Wang † Ruodu Wang ‡ November 11, 2020

Abstract

Optimization of distortion riskmetrics with distributional uncertainty has wide applicationsin ﬁnance and operations research. Distortion riskmetrics include many commonly applied riskmeasures and deviation measures, which are not necessarily monotone or convex. One of ourcentral ﬁndings is a unifying result that allows us to convert an optimization of a non-convexdistortion riskmetric with distributional uncertainty to a convex one, leading to great tractability.The key to the unifying equivalence result is the novel notion of closedness under concentrationof sets of distributions. Our results include many special cases that are well studied in theoptimization literature, including but not limited to optimizing probabilities, Value-at-Risk,Expected Shortfall, and Yaari’s dual utility under various forms of distributional uncertainty.We illustrate our theoretical results via applications to portfolio optimization, optimization undermoment constraints, and preference robust optimization.

Keywords : risk measures; deviation measures; distributionally robust optimization; convexiﬁ-cation; conditional expectation

Riskmetrics, such as measures of risk and variability, are common tools to represent preferences,model decisions under risks, and quantify diﬀerent types of risks. To ﬁx terms, we refer to riskmetrics as any mapping from a set of random variables to the real line, and risk measures as riskmetricsthat are monotone in the sense of Artzner et al. (1999).In this paper, we focus on distortion riskmetrics which is a large class of commonly used mea-sures of risk and variability; see Wang et al. (2020a) for the terminology “distortion riskmetrics”.Distortion riskmetrics include L-functionals (Huber and Ronchetti (2009)) in statistics, Yaari’s dualutilities (Yaari (1987)) in decision theory, distorted premium principles (Wang et al. (1997)) in in-surance, and spectral risk measures (Acerbi (2002)) in ﬁnance; see Wang et al. (2020a) for further ∗ Department of Statistical Sciences, University of Toronto, Canada. (cid:0) [email protected] † Department of Statistics and Actuarial Science, University of Waterloo, Canada. (cid:0) [email protected] ‡ Department of Statistics and Actuarial Science, University of Waterloo, Canada. (cid:0) [email protected] distortion risk measures ,which include, in particular, the two most important risk measures used in current banking andinsurance regulation, the Value-at-Risk (VaR) and the Expected Shortfall (ES). Moreover, convexdistortion riskmetrics are the building blocks (via taking a supremum) for all convex risk functionals(Liu et al. (2020)), including classic risk measures (Artzner et al. (1999) and F¨ollmer and Schied(2002)) and deviation measures (Rockafellar et al. (2006)).When riskmetrics are evaluated on distributions that are subject to uncertainty, decisionsshould be taken with respect to the worst (or best) possible values a riskmetric attains over a set ofalternative distributions; giving rise to the active subﬁeld of distributionally robust optimization.The set of alternative distributions, the uncertainty set , may be characterized by moment constraints(e.g., Popescu (2007)), parameter uncertainty (e.g., Delage and Ye (2010)), probability constraints(e.g., Wiesemann et al. (2014)), and distributional distances (e.g., Blanchet and Murthy (2019)),amongst others. Popular distortion risk measures such as VaR and ES are studied extensively inthis context; see e.g., Natarajan et al. (2008) and Zhu and Fukushima (2009).Optimization of convex distortion risk measures, i.e., distortion riskmetrics with an increasingand concave distortion function, is relatively well understood under distributional uncertainty; seeCornilly et al. (2018), Li (2018), and Liu et al. (2020) for some recent work. Nevertheless, many dis-tortion riskmetrics are not convex or monotone. For example, in the Cumulative Prospect Theory ofTversky and Kahneman (1992), the distortion function is typically assumed to be inverse-S-shaped;in ﬁnancial risk management, the popular risk measure VaR has a non-concave distortion function,and the inter-quantile diﬀerence (Wang et al., 2020b) has a distortion function that is neither con-cave nor monotone. Another example is the diﬀerence between two distortion risk measures, whichis clearly not increasing or convex in general. Optimizing non-convex distortion riskmetrics underdistributional uncertainty is diﬃcult and results are available only for special cases; see Li et al.(2018), Cai et al. (2018), Zhu and Shao (2018), Wang et al. (2019), and Bernard et al. (2020), allwith an increasing distortion function.There is, however, a notable common feature in the above mentioned literature when a non-convex distortion risk metric is involved. For numerous special cases, one often obtains an equiv-alence between the optimization problem with non-convex distortion riskmetric and that with aconvex one. Inspired by this observation, the aim of this paper is to address:What conditions provide equivalence between a non-convex riskmetric and a convex one, in thesetting of distributional uncertainty?An answer to this question is still missing in the literature. In this sense, we oﬀer a novel perspectiveon distributionally robust optimization problems by converting non-convex optimization problemsto their convex counterpart. Transforming a non-convex to a convex optimization problem throughapproximation and via a direct equivalence has been studied by Zymler et al. (2013) and Cai et al.(2020). Both contributions, however, consider uncertainty sets described by moment constraints.A unifying framework applicable to numerous uncertainty sets and the entire class of distortionriskmetrics is however missing and at the core of this paper.2he main novelty of our results is three-fold: ﬁrst, we obtain a unifying result (Theorem 1) thatallows, under distributional uncertainty, to convert an optimization problem of a non-convex dis-tortion riskmetric to an optimization problem with a convex one. The result covers, to the author’sbest knowledge, all known equivalences between optimization problems of non-convex and convexriskmetrics with distributional uncertainty. The proof requires techniques beyond the ones usedin the existing literature, as we do not make assumptions such as monotonicity, positiveness, andcontinuity. Second, we introduce the concept of closedness under concentration as a suﬃcient condi-tion to establish the equivalence. We show how the property of closedness under concentration caneasily be veriﬁed and provide numerous examples. Third, the classes of distortion riskmetrics anduncertainty formulations considered in this paper include all special cases studied in the literature:1. Our class of riskmetrics include (see Section 2):(a) all practically used risk measures (some via taking a sup): VaR (quantile), ES, spectralrisk measures, Wang’s premium principles, law-invariant coherent, as well as convex, riskmeasures;(b) all practically used variability measures (some via taking a sup): mean-variance func-tionals, the mean-median deviation, inter-quantile ranges, inter-ES ranges, the Gini de-viation, and deviation measures including the standard deviation and the variance;(c) distortion riskmetrics with an inverse-S-shaped distortion function of Tversky and Kahneman(1992) widely used in decision theory;(d) diﬀerences between two distortion riskmetrics, which are not convex or monotone ingeneral;(e) distortion riskmetrics with non-continuous and non-monotone distortion functions, whichusually cause substantial diﬃculties in the analysis of optimization problems.2. Our uncertainty formulations include (see Section 3):(a) both supremum and inﬁmum problems, thus providing a universal treatment of worst-case and best-case risk values;(b) moment constraints such as mean and variance, a popular setting in the literature;(c) convex order/risk measure constraints, thus relating to stochastic dominance and regu-latory constraints;(d) some marginal constraints, representing dependence uncertainty in risk aggregation (e.g.,Embrechts et al. (2015));(e) preference robust optimization (e.g., Armbruster and Delage (2015) and Guo and Xu(2020)).The great generality distinguishes our work from the large literature on distributional robust opti-mization cited above. Our work is of analytical and probabilistic nature, and we focus on a theoret-ical equivalence result which we then illustrate via numerical implementations. The optimization3roblems are formally introduced in Section 2. Sections 3 is devoted to our main contribution ofthe equivalence of non-convex and convex optimization problems with distributional uncertainty.In Section 4, we show how our results can be used to solve optimization problems with uncertaintysets deﬁned via moment constraints. In particular, we generalize well-known results in the literatureon optimization of risk measures under moment constraints. Sections 5 and 6 contain numericalillustrations of optimizing diﬀerences between two distortion riskmetrics, portfolio optimization,and preference robust optimization.

Throughout, we work with an atomless probability space (Ω , F , P ). For d, n ∈ N , A ⊂ R d represents a set of actions, ρ is an objective functional, f : R d + n → R is a loss function, and X is an n -dimensional random vector with distributional uncertainty. Many problems in distributionallyrobust optimization have the form min a ∈ A sup F X ∈ f M ρ ( f ( a , X )) , (1)where F X denotes the distribution of X and f M is a set of plausible distributions for X . We willﬁrst focus on the inner problem sup F X ∈ f M ρ ( f ( a , X )) , (2)which we may rewrite as sup F Y ∈M ρ ( Y ) , (3)where F Y denotes the distribution of Y and M is a set of distributions on R . We suppress thereliance on a as it remains constant in the inner problem (2). The supremum in (3) is typicallyreferred to as the worst-case risk measure in the literature if ρ is monotone. We denote by L p , p ∈ [1 , ∞ ), the space of random variables with ﬁnite p -th moment. Let L ∞ represent the set of bounded random variables and let L represent the space of all random variables.Denote by H the set of functions h : [0 , R with bounded variation satisfying h (0) = 0. For p ∈ [1 , ∞ ] and h ∈ H , a distortion riskmetric ρ h : L p → R is deﬁned as ρ h ( Y ) = Z ∞ h ◦ P ( Y > x ) d x + Z −∞ ( h ◦ P ( Y > x ) − h (1)) d x, Y ∈ L p , (4)whenever the above integrals are ﬁnite; see Proposition 5 below for a suﬃcient condition. Thefunction h ∈ H is called a distortion function . Note that we allow h to be non-monotone; if h isincreasing and h (1) = 1, then ρ h is a distortion risk measure. The distortion riskmetric ρ h is convex A risk measure ρ : L p → R is monotone if ρ ( X ) ρ ( Y ) for all X, Y ∈ L p with X Y .

4f and only if h is concave; see Wang et al. (2020b) for this and other properties of ρ h .In this paper we consider the objective functional ρ in (1) to be a distortion riskmetric ρ h forsome h ∈ H , as the class of distortion riskmetrics includes a large class of objective functionals ofinterest. Note that a general analysis of (3) also covers the inﬁmum problem inf F Y ∈M ρ h ( Y ), since − ρ h = ρ − h is again a distortion riskmetric. This illustrates an advantage of studying distortionriskmetrics over monotone ones, as our analysis uniﬁes best- and worst-case risk evaluations. Wenote that best-case risk measures are also of practical importance. For instance, there may besome external requirements of what statistical models a risk analyst may use in a risk assessmentprocedure, and the analyst may choose to report the model representing the best-case values of arisk measure to minimize the risk level, as long as the model satisﬁes the external speciﬁcations.If ρ h is not convex, or equivalently, h is not concave, optimization problems of the type (3) areoften highly nontrivial. However, the optimization problemsup F Y ∈M ρ h ∗ ( Y ) , (5)where h ∗ is the smallest concave distortion function dominating h , is convex and can often be solvedrelatively easily either analytically or through numerical methods. Clearly, since ρ h ∗ > ρ h , we havesup F Y ∈M ρ h ( Y ) sup F Y ∈M ρ h ∗ ( Y )and one naturally wonders when the above inequality holds as equality, that is, under what condi-tions sup F Y ∈M ρ h ( Y ) = sup F Y ∈M ρ h ∗ ( Y ) . (6)The main contribution of this paper is a condition on the uncertainty set M that guaranteesequivalence of these optimization problems, that is for (6). If (6) holds, then the non-convexproblem (lhd of equation (6)) is converted into the convex problem (rhd of equation (6)), providinghuge convenience, which in turn helps to solve the minimax problem (1). The equivalence (6) hasbeen established for some special cases of h ∗ and M and we refer to the literature review in theintroduction. For p > n ∈ N , we denote by M np the set of all distributions on R n with ﬁnite p -th moment. Let M n ∞ be the set of n -dimensional distributions of bounded random variables. For p ∈ [1 , ∞ ], write M p = M p for simplicity. For a distribution F ∈ M , let its left- and right-quantilefunctions be given respectively by F − ( α ) = inf { x ∈ R : F ( x ) > α } and F − ( α ) = inf { x ∈ R : F ( x ) > α } , α ∈ [0 , , ∅ ) = ∞ . For x, y ∈ R , we write x ∨ y = max { x, y } and x ∧ y = min { x, y } .Since h ∈ H is of bounded variation, its discontinuity points are at most countable and the left-and right-limits exist at each of these points. We write h ( t + ) = ( lim x ↓ t h ( x ) , t ∈ [0 , ,h (1) , t = 1 , and h ( t − ) = ( lim x ↑ t h ( x ) , t ∈ (0 , ,h (0) , t = 0 , and we denote the set of discontinuity points of h (excluding 0 and 1) by J h = { t ∈ (0 ,

1) : h ( t ) = h ( t + ) or h ( t ) = h ( t − ) } . (7)For h ∈ H and t ∈ [0 , h ∗ and h ∗ respectively by h ∗ ( t ) = inf { g ( t ) : g ∈ H , g > h, g is concave on [0 , } ,h ∗ ( t ) = sup { g ( t ) : g ∈ H , g h, g is convex on [0 , } . Both h ∗ and h ∗ are continuous functions on (0 ,

1) for all h ∈ H , and if h is continuous at 0 and 1,then so are h ∗ and h ∗ (see Figure 3 below for an illustration of h and h ∗ ). Denote by H ∗ (resp. H ∗ )the set of concave (resp. convex) functions in H . Note that for all h ∈ H , we have h ∗ ∈ H ∗ and h ∗ ∈ H ∗ . As a well-known property of the convex and concave envelopes of a continuous h (e.g.,Brighi and Chipot (1994)), h ∗ (resp. h ∗ ) diﬀers from h on a union of disjoint open intervals, and h ∗ (resp. h ∗ ) is linear on these intervals. Below, we provide a new result on convex envelopes ofdistortion functions h that are not necessarily monotone or continuous. For this, we deﬁne theupper semicontinuous modiﬁcation of h byˆ h ( t ) = ( h ( t + ) ∨ h ( t − ) ∨ h ( t ) , t ∈ J h ,h ( t ) , otherwise . (8)Note that we do not make any modiﬁcation at the points 0 and 1 which are excluded from J h , evenif h has a jump at these points. The functions h and ˆ h and the set J h are illustrated in Figure 1.While in general ρ h and ρ ˆ h are diﬀerent functionals, one has ρ h ( Y ) = ρ ˆ h ( Y ) for any randomvariable Y with continuous quantile function; see Lemma 1 of Wang et al. (2020a). Moreover, h ∗ = (ˆ h ) ∗ > ˆ h > h and the four functions are all equal if h is concave. Proposition 1 may be ofindependent interest, and its proof is in Appendix A.1. Proposition 1.

For any h ∈ H , we have h ∗ = (ˆ h ) ∗ and the set { t ∈ [0 ,

1] : ˆ h ( t ) = h ∗ ( t ) } is theunion of some disjoint open intervals. Moreover, h ∗ is linear on each of the above intervals. In the sequel, we mainly focus on h ∗ , which will be useful when optimizing ρ h in (3). A similarresult to Proposition 1 holds for h ∗ , useful in the corresponding inﬁmum problem, where the uppersemicontinuous modiﬁcation of h is replaced by the lower semicontinuous one. This follows directlyfrom Proposition 1 by setting g = − h which gives ρ g = − ρ h and h ∗ = − g ∗ .6igure 1: An example of h (left) and ˆ h (right) with J h = { t , t , t , t , t } ; the dashed lines represent h ∗ and (ˆ h ) ∗ , respectivelyFor all distortion functions h ∈ H , from Proposition 1, there exist (countably many) disjointopen intervals on which ˆ h = h ∗ . Using a similar notation to Wang et al. (2019), we deﬁne the set I h = { (1 − b, − a ) : ˆ h = h ∗ on ( a, b ) } . Next we provide a few examples of distortion riskmetrics commonly used in decision theory andﬁnance. The Value-at-Risk (VaR) of Y ∈ L , using the sign convention of McNeil et al. (2015), isdeﬁned as the left-quantile of F Y :VaR α ( Y ) = F − Y ( α ) , α ∈ (0 , . Similarly, we deﬁne the upper Value-at-Risk (VaR + ) of Y ∈ L as the right-quantile of F Y :VaR + α ( Y ) = F − Y ( α ) , α ∈ [0 , . We also deﬁne the Expected Shortfall (ES) asES α ( Y ) = 11 − α Z α VaR t ( Y ) d t, α ∈ (0 , , Y ∈ L . Both VaR α and ES α belong to the class of distortion riskmetrics. The examples below give thedistortion functions of some distortion riskmetrics that we are interested in and the correspondingsets I h . Example 1 (VaR and ES) . Take α ∈ (0 , h ( t ) = (1 − α, ( t ), t ∈ [0 , h ∈ H and ˆ h ( t ) = [1 − α, ( t ), t ∈ [0 , ρ h = VaR α . Moreover, h ∗ ( t ) = t − α ∧ t ∈ [0 ,

1] and ρ h ∗ = ES α . Since h ∗ and ˆ h diﬀer on (0 , − α ) , we have I h = { ( α, } . Example 2 (TK distortion riskmetrics) . We consider the following inverse-S-shaped distortion7unction (see also Figure 3): h ( t ) = t γ ( t γ + (1 − t ) γ ) /γ , t ∈ [0 , , γ ∈ (0 , . (9)Distortion riskmetrics with distortion function (9) are commonly used in behavioural economics andﬁnance; see e.g., Tversky and Kahneman (1992). For simplicity, we call such distortion riskmetrics TK distortion riskmetrics . Typical values of γ are in [0 . , . h in (9), it is clear that h = ˆ h on [0 ,

1] by continuity of h . We have h ∗ = h on ( t , t ∈ (0 , h ∗ is linear on [ t , I h = { (0 , − t ) } . An example of h in (9) and itsconcave envelope h ∗ are plotted in Figure 2 (left).For h , h ∈ H , we write h = h − h ∈ H and consider the diﬀerence between two distortionriskmetrics, that is ρ h = ρ h − ρ h . (10)Such type of distortion riskmetrics measure the diﬀerence or disagreement between two utilities,risk attitudes, or capital requirements. Determining the upper and lower bounds, or the largestabsolute values of such measures of disagreement, is of interest in practice but rarely studied in theliterature. Note that h − h is in general not monotone or concave even when h and h themselveshave the speciﬁed properties. Below we show some examples of distortion riskmetrics taking theform of (10). Example 3 (Inter-quantile range and inter-ES range) . For α ∈ [1 / , h ( t ) = [1 − α, ( t )and h ( t ) = ( α, ( t ), t ∈ [0 , h ( t ) = h ( t ) − h ( t ) = { − α t α } , t ∈ [0 , h = h ,and ρ h ( X ) = F − X ( α ) − F − X (1 − α ) , X ∈ L . Correspondingly, we have h ∗ ( t ) = t/ (1 − α ) ∧ α − t ) / (1 − α ) ∧ t ∈ [0 , ρ h ∗ ( X ) = ES α ( X ) + ES α ( − X ) , X ∈ L . This distortion riskmetric ρ h is called an inter-quantile range and ρ h ∗ is called an inter-ES range.As the distortion functions h ∗ and ˆ h diﬀer on the open intervals (0 , − α ) and ( α, I h = { ( α, , (0 , − α ) } . The distortion functions h and h ∗ are displayed in Figure 2 (right). Example 4 (Diﬀerence of two inverse-S-shaped distortion functions) . We take h and h to be theinverse-S-shaped distortion functions in (9), with parameters γ = 0 . γ = 0 .

7, respectively.By calculation, the function h = h − h is convex on [0 , . . , h ∗ is linear on [0 , . h ∗ = h on[0 . , I h = { (0 . , } . The graphs of the distortion functions h , h , h , and h ∗ are displayed in Figure 3. 8igure 2: Left panel: h and h ∗ for the TK distortion riskmetric with γ = 0 . h and h ∗ for the inter-quantile range in Example 3 Figure 3: Left panel: inverse-S-shaped distortion functions h and h in Example 4; right panel: h = h − h and h ∗ of the same exampleThe functions in H are a.e. diﬀerentiable, and for an absolutely continuous function h ∈ H ,let h ′ be a (representative) function on [0 ,

1] that is a.e. equal to the derivative of h . If h ∈ H isleft-continuous or VaR t ( Y ) is continuous with respect to t ∈ (0 , ρ h in (4) hasrepresentation ρ h ( Y ) = Z VaR − t ( Y ) d h ( t ) , Y ∈ L p ; (11)see Lemma 1 of Wang et al. (2020a). If h ∈ H is absolutely continuous it holds ρ h ( Y ) = Z VaR − t ( Y ) h ′ ( t ) d t, Y ∈ L p . (12)9 Equivalence between non-convex and convex riskmetrics

In this section, we introduce the concept of concentration and present some examples. Thisconcept will be used to explain our main equivalence result, Theorem 1.For a distribution F ∈ M and an interval C ⊂ [0 , C -concentration of F ,denote by F C , as the distribution of the random variable F − ( U ) { U C } + E [ F − ( U ) | U ∈ C ] { U ∈ C } , (13)where U ∼ U[0 ,

1] is a standard uniform random variable. In other words, F C is obtained byconcentrating the probability mass of F − ( U ) on { U ∈ C } at its conditional expectation, whereasthe rest of the distribution remains unchanged. The following result gives the left-quantile functionof (13). The proof follows directly from the deﬁnition of C -concentration. Proposition 2.

For F ∈ M and a < b , the left-quantile function of F ( a,b ) is given by F − ( t ) { t ( a,b ] } + R ba F − ( u ) d ub − a { t ∈ ( a,b ] } , t ∈ [0 , . For a set of distributions

M ⊂ M , we say that M is closed under concentration , if for all F ∈ M , we have F C ∈ M for all intervals C ⊂ [0 , M is closedunder conditional expectation , if for all F X ∈ M , the distribution of any conditional expectation of X is in M . Below is a technical proposition clarifying the relationship between closedness underconcentration and closedness under conditional expectation. Proposition 3.

Closedness under conditional expectation implies closedness under concentration,but the converse is not true.Proof.

We ﬁrst prove that closedness under conditional expectation implies closedness under con-centration. For all random variables Y ∈ L and intervals C ⊂ [0 , X = F − Y ( U ) { U C } + E [ F − Y ( U ) | U ∈ C ] { U ∈ C } , where U ∼ U[0 , X is the concentration F CY . For all σ ( X )-measurable randomvariables Z , we have that Z |{ U ∈ C } is constant. Hence, E [ XZ ] = E [ ZF − Y ( U ) { U C } + Z E [ F − Y ( U ) | U ∈ C ] { U ∈ C } ]= E [ ZF − Y ( U ) { U C } ] + E [ E [ ZF − Y ( U ) | U ∈ C ] { U ∈ C } ]= E [ ZF − Y ( U ) { U C } ] + E [ ZF − Y ( U ) | U ∈ C ] P ( U ∈ C )= E [ ZF − Y ( U ) { U C } ] + E [ ZF − Y ( U ) { U ∈ C } ] = E [ ZF − Y ( U )] . It follows that E [ Y | X ] = E [ F − Y ( U ) | X ] = X , P -almost surely. If a set of distributions, M , is closed10nder conditional expectation and F Y ∈ M , then F E [ Y | X ] ∈ M , which implies that F CY = F X ∈ M .Thus M is also closed under concentration.For a counter-example showing that the converse statement does not hold in general, seeRemark 2 below. Remark 1.

Using Strassen’s Theorem (e.g., Theorem 3.A.4 of Shaked and Shanthikumar (2007)),closedness under conditional expectation can equivalently be expressed using convex order. A set

M ⊂ M is closed under conditional expectation if and only if it holds that for F ∈ M and G cx F , we have G ∈ M , where cx is the inequality in convex order. Example 5.

We give some examples of sets M which are closed under concentration and conditionalexpectation. These examples are extensively studied in the literature of ﬁnance, optimization, andrisk management.1. (Moment conditions) For p > m ∈ R , and v >

0, let M p ( m, v ) = { F Y ∈ M : E [ Y ] = m, E [ | Y − m | p ] v p } . Let random variables Y and X be such that E [ Y ] = m , E [ | Y − m | p ] v p , and X = E [ Y |G ]for some σ -algebra G . We have E [ X ] = E [ Y ] = m and E [ | X − m | p ] E [ | Y − m | p ] v p byJensen’s inequality. Therefore, M p ( m, v ) is closed under conditional expectation, and hencealso closed under concentration. The set M p corresponds to distributional uncertainty withmoment information, and the setting p = 2 (mean and variance constraints) is the mostcommonly studied; see the references in the introduction.2. (Mean-covariance conditions) For n ∈ N , a ∈ [0 , n , µ ∈ R n , and Σ ∈ R n × n positive semidef-inite, let M a ( µ , Σ) = { F a ⊤ X ∈ M : F X ∈ M n , E [ X ] = µ , var( X ) (cid:22) Σ } , where X = ( X , . . . , X n ), E [ X ] = ( E [ X ] , . . . , E [ X n ]), var( X ) is the covariance matrix of X ,and B ′ (cid:22) B means that the matrix B − B ′ is positive semideﬁnite for two positive semideﬁnitesymmetric matrices B and B ′ . A similar form of the set M a ( µ , Σ) is studied in Cai et al.(2020) as a special case of a general ambiguity set proposed in Delage et al. (2014). Cai et al.(2020) focus on the distributional uncertainty of the random vector X with constrained meanand variance, while the set M a ( µ , Σ) only incorporates the distributional robustness of theweighted sum a ⊤ X . We note that the set M a ( µ , Σ) is equivalent to { F S ∈ M : E [ S ] = a ⊤ µ , var( S ) a ⊤ Σ a } = M (cid:18) a ⊤ µ , (cid:16) a ⊤ Σ a (cid:17) / (cid:19) . For a proof of the equivalence between the sets with ﬁxed mean and covariance matrix, seePopescu (2007). Indeed, it is clear that M a ( µ , Σ) ⊂ M ( a ⊤ µ , ( a ⊤ Σ a ) / ). On the other Precisely, we write G cx F if R φ d G R φ d F for all convex functions φ such that the above two integrals arewell deﬁned. F S ∈ M ( a ⊤ µ , ( a ⊤ Σ a ) / ), we write a = ( a , . . . , a n ), µ = ( µ , . . . , µ n ), andtake X = ( X , . . . , X n ) such that X i = ( S − a ⊤ µ ) / ( na i ) + µ i , for i = 1 , . . . , n . It followsthat F S = F a ⊤ X ∈ M a ( µ , Σ). Therefore, we have M a ( µ , Σ) = M ( a ⊤ µ , ( a ⊤ Σ a ) / ) whichis closed under concentration for all a ∈ [0 , n by a similar arguments as in the momentcondition 1.3. (Convex function conditions) For K ⊂ N , a collection f = ( f k ) k ∈ K of convex functions on R ,and a vector x = ( x k ) k ∈ K ∈ R | K | , let M f ( x ) = { F Y ∈ M : E [ f k ( Y )] x k for all k ∈ K } . Again, by Jensen’s inequality, M f ( x ) is closed under conditional expectation, and hence itis also closed under concentration. Here, K can be a ﬁnite or an inﬁnite set. The set M f corresponds to distributional uncertainty with constraints on expected losses or test functions.Note that M f includes M p as a special case by choosing x = ( x , x , x ) and f = ( f , f , f ),where x = m ∈ R , x = − m , x = v p > f : y y , f : y

7→ − y , and f : y

7→ | y − m | p .4. (Distortion conditions) For a collection h = ( h k ) k ∈ K ∈ ( H ∗ ) | K | and a vector x = ( x k ) k ∈ K ∈ R | K | , let M h ( x ) = { F Y ∈ M : ρ h k ( Y ) x k for all k ∈ K } . Similarly to above, M h ( x ) is closed under conditional expectation, and hence it is also closedunder concentration. The set M h corresponds to distributional uncertainty with constraintson preferences modeled by convex dual utilities.5. (Convex order conditions) For a collection of random variables Z = ( Z k ) k ∈ K ∈ ( L ) | K | , let M cx ( Z ) = { F Y ∈ M : Y cx Z k for all k ∈ K } . Similar to the above two examples, M cx ( Z ) is closed under conditional expectation (cf. Re-mark 1), and thus is closed under concentration.6. (Marginal conditions) For given univariate distributions F , . . . , F n ∈ M , let M S ( F , . . . , F n ) = { F X + ··· + X n ∈ M : X i ∼ F i , i = 1 , . . . , n } . In other words, M S is the set of all possible aggregate risks X + · · · + X n with given marginaldistributions of X , . . . , X n ; see Embrechts et al. (2015) for some results on M S . Generally, M S is not closed under concentration or conditional expectation, since closedness underconcentration is stronger than joint mixability (Wang and Wang, 2016). In the special casewhere F = · · · = F n = U[0 , M S is closed under conditional expectation if and only if n > F ∈ M and a collection I of disjoint intervals in [0 , F I be the12istribution corresponding to the left-quantile function given by the left-continuous version of F − ( t ) { t S C ∈I C } + X C ∈I R C F − ( u ) d uλ ( C ) { t ∈ C } , (14)where λ is the Lebesgue measure. We have the following result regarding to the distribution F I .The proof of Proposition 4 is in Appendix A.2. Proposition 4.

Let I be a collection of disjoint intervals in [0 , and M be a set of distributions.If M is closed under concentration and I is ﬁnite, or M is closed under conditional expectation,then F I ∈ M for all F ∈ M . Remark 2.

In Proposition 4, if M is closed under conditional expectation I can be taken inﬁnite,however, Proposition 4 may not be true when M is closed under concentration and I is inﬁnite.Indeed, if we take M as the set of distributions obtained by some F ∈ M with ﬁnitely many con-centrations, then clearly M is closed under concentration. However, F I / ∈ M when I is an inﬁnitecollection of disjoint intervals. This also serves as a counter-example of the converse statement ofProposition 3 since M is closed under concentration but not closed under conditional expectation. Our main equivalence result is summarized in the following theorem. For a set of distributions

M ⊂ M and a collection I of intervals in [0 , M is closed under concentration within I , if { F C : C ∈ I , F ∈ M} ⊂ M ; closedness under concentration in Section 3.1 corresponds to I being the collection of all intervals in [0 , Theorem 1.

For

M ⊂ M and h ∈ H , the following hold.(i) If M is closed under concentration, then sup F Y ∈M ρ h ( Y ) = sup F Y ∈M ρ h ∗ ( Y ) . (15) (ii) If h = ˆ h and M is closed under concentration within I h , then (15) holds.(iii) If h = ˆ h , M is closed under conditional expectation, and the second supremum in (15) isattained by some F ∈ M , then F I h attains both supremums. The proof of Theorem 1 is much more involved than similar results in the literature becauseof the challenges arising from non-monotonicity, non-positivity, and discontinuity of h ; see Figure1 for a sample of possible complications. The proof is put in Appendix A.1.For distortion functions h such that I h = { ( p, } (resp. I h = { (0 , p ) } ) for some p ∈ (0 , M to be closed under concentration within { ( p, } (resp. { (0 , p ) } ). Such distortion functions include the inverse-S-shaped distortion functions in (9),and those of VaR p , and VaR + p . 13 xample 6. Let p ∈ (0 ,

1) and M = { U[0 , , pδ p/ + (1 − p )U[ p, } where δ p/ is the point-massat p/

2. We can check that M is closed under concentration within { (0 , p ) } but M is not closedunder concentration. Indeed, any set closed under concentration and containing U[0 ,

1] has inﬁnitelymany elements. Another example that is closed under concentration within { (0 , p ) } is the set ofall possible distributions of the sum of several Pareto risks; see Example 5.1 of Wang et al. (2019).Hence, the condition in Theorem 1 (ii) on M is considerably weaker than that in (i). Example 7.

As we see from Example 1, if ρ h = VaR + α for some α ∈ (0 , ρ h ∗ is ES α and I h = { ( α, } . Theorem 1 (ii) implies that if M is closed under concentration within { ( α, } , thensup F Y ∈M VaR + α ( Y ) = sup F Y ∈M ES α ( Y ) . This observation leads to (with some modiﬁcations) the main results in Wang et al. (2015) andLi et al. (2018) on the equivalence between VaR and ES.

Example 8.

If we take h to be an inverse-S-shaped distortion function in (9), then I h = { (0 , − t ) } for some t ∈ (0 , ρ h is the TK distortion riskmetric. As a direct consequence of Theorem 1(ii), if M is closed under concentration within { (0 , − t ) } , thensup F Y ∈M ρ h ( Y ) = sup F Y ∈M ρ h ∗ ( Y ) . This result implies Theorem 4.11 of Wang et al. (2019) on the robust risk aggregation problembased on dual utilities with inverse-S-shaped distortion functions.

Remark 3.

Using Theorem 1, if for some a ∈ A , the set M := { F f ( a , X ) : F X ∈ f M} is closed underconcentration and sup { ρ h ∗ ( f ( a , X )) : F X ∈ f M} = ∞ , then sup { ρ h ( f ( a , X )) : F X ∈ f M} = ∞ .Thus, both objectives in the inner optimization of (1) are inﬁnite for this a , which can be excludedfrom the outer optimization over A . It is easier to verify sup { ρ h ∗ ( f ( a , X )) : F X ∈ f M} = ∞ thansup { ρ h ( f ( a , X )) : F X ∈ f M} = ∞ since generally ρ h is smaller than ρ h ∗ . Remark 4.

The uncertainty set M p ( m, v ) of the moment condition in Example 5 can be restrictedto the set M p ( m, v ) = { F Y ∈ M : E [ Y ] = m, E [ | Y − m | p ] = v p } , which is the “boundary” of M p ( m, v ) where the supremums of the distortion riskmetrics on bothsides of (15) are obtained. As a direct consequence, we getsup F Y ∈M p ( m,v ) ρ h ∗ ( Y ) = sup F Y ∈M p ( m,v ) ρ h ( Y ) = sup F Y ∈M p ( m,v ) ρ h ( Y ) , even though M p ( m, v ) is not closed under concentration. This example also suggests that it maynot be easy to obtain a necessary condition for (15) to hold in general. One of the reasons isthat the structure of the set M can be arbitrary as long as it includes the distributions where thesupremums in (15) are obtained. 14 Uncertainty set with moment constraints

A popular example of an uncertainty set closed under concentration is that of distributionswith speciﬁed moment constraints as in Example 5. We investigate this uncertainty set in detail andoﬀer in this section some general results, which generalize several existing results in the literature;none of the results in the literature include non-monotone and non-convex distortion functions.Non-monotone distortion functions create diﬃculties because of possible complications at theirdiscontinuity points.For p > m ∈ R and v >

0, we recall the set of interest in Example 5: M p ( m, v ) = { F Y ∈ M : E [ Y ] = m, E [ | Y − m | p ] v p } . Let q ∈ [1 , ∞ ] be the H¨older conjugate of p , namely q = (1 − /p ) − , or equivalently, 1 /p + 1 /q = 1.For all h ∈ H ∗ or h ∈ H ∗ , we denote by || h ′ − x || q = (cid:18)Z | h ′ ( t ) − x | q d t (cid:19) /q , q < ∞ and || h ′ − x || ∞ = max t ∈ [0 , | h ′ ( t ) − x | , x ∈ R . (16)We introduce the following quantities: c h,q = arg min x ∈ R || h ′ − x || q and [ h ] q = min x ∈ R || h ′ − x || q = || h ′ − c h,q || q . We set [ h ] q = ∞ if h is not continuous. It is easy to verify that c h,q is unique for q >

1. Thequantity [ h ] q may be interpreted as a q -central norm of the function h and c h,q as its q -center.Note that for q = 2 and h continuous, [ h ] = || h ′ − h (1) || and c h, = h (1). We also note that theoptimization problem is trivial if [ h ] q = 0, which corresponds to the case that h ′ = h (1) [0 , and ρ h is a linear functional, thus a multiple of the expectation. In this case, the supremum and inﬁmumare attained by all random variables whose distributions are in M p ( m, v ), and they are equal to mh (1). Furthermore, for h ∈ H ∗ or h ∈ H ∗ , and q >

1, we deﬁne a function on [0 ,

1] by φ qh ( t ) = | h ′ (1 − t ) − c h,q | q h ′ (1 − t ) − c h,q [ h ] − qq if h ′ (1 − t ) − c h,q = 0 , and φ qh ( t ) = 0 otherwise . In case q = 2, for t ∈ [0 , φ h ( t ) = ( h ′ (1 − t ) − h (1)) || h ′ − h (1) || − if || h ′ − h (1) || > Theorem 2.

For any h ∈ H , m ∈ R , v > and p > , we have sup F Y ∈M p ( m,v ) ρ h ( Y ) = mh (1) + v [ h ∗ ] q and inf F Y ∈M p ( m,v ) ρ h ( Y ) = mh (1) − v [ h ∗ ] q . (17) Moreover, if h = ˆ h , < [ h ∗ ] q < ∞ and < [ h ∗ ] q < ∞ , then the supremum and inﬁmum in (17) are attained by a random variable X such that F X ∈ M p ( m, v ) with its quantile function uniquelyspeciﬁed as a.e. equal to m + vφ qh ∗ and m − vφ qh ∗ , respectively. h ∈ H ∗ (resp. h ∈ H ∗ ) and q > φ qh is increasing (resp. decreasing) on [0 , φ qh (resp. − φ qh ) in Theorem 2 indeed determinesa quantile function.The following proposition concerns the ﬁniteness of ρ h on L p . Proposition 5.

For any h ∈ H and p ∈ [1 , ∞ ] , ρ h is ﬁnite on L p if [ h ∗ ] q < ∞ and [ h ∗ ] q < ∞ .Proof. Note that ρ h ρ h ∗ , which is implied by h h ∗ and (4). By H¨older’s inequality, for any Y ∈ L p , using (12), we have Z h ∗′ ( t )VaR − t ( Y ) d t = Z ( h ∗′ ( t ) − c h ∗ ,q )VaR − t ( Y ) d t + c h,q E [ Y ] [ h ∗ ] q || Y || p + c h ∗ ,q E [ Y ] < ∞ . The other half of the statement is analogous.As a special case of Proposition 5, ρ h is always ﬁnite on L if h is convex or concave withbounded h ′ because [ h ∗ ] ∞ < ∞ and [ h ∗ ] ∞ < ∞ .As a common example of the general result in Theorem 2, below we collect our ﬁndings forthe case of VaR. The proof of Corollary 1 is in Appendix A.5. Corollary 1.

For α ∈ (0 , , p > , m ∈ R and v > , we have sup F Y ∈M p ( m,v ) VaR α ( Y ) = max F Y ∈M p ( m,v ) ES α ( Y ) = m + vα ( α p (1 − α ) + (1 − α ) p α ) − /p , and inf F Y ∈M p ( m,v ) VaR α ( Y ) = min F Y ∈M p ( m,v ) ES Lα ( Y ) = m − v (1 − α ) ( α p (1 − α ) + (1 − α ) p α ) − /p , where ES Lα ( Y ) = 1 α Z α VaR t ( Y ) d t, Y ∈ L . We see from Theorem 2 that if h = ˆ h , then the supremum and the inﬁmum of ρ h ( Y ) over F Y ∈ M p ( m, v ) are always attainable. However, in case h = ˆ h , the supremum or inﬁmum may nolonger be attainable as a maximum or minimum. We illustrate this in Example 9 below. Example 9 (VaR and ES, p = 2) . Take α ∈ (0 , p = 2 and ρ h = VaR α , which implies ρ h ∗ = ES α .We calculate [ h ∗ ] = Z (cid:12)(cid:12)(cid:12) − α [0 , − α ] ( t ) − (cid:12)(cid:12)(cid:12) d t = α − α . Corollary 1 gives sup F Y ∈M ( m,v ) VaR α ( Y ) = sup F Y ∈M ( m,v ) ES α ( Y ) = m + v q α − α . (18)16his is the well-known Cantalli-type formula for ES. By Lemma A.1, the unique left-quantile func-tion of the random variable Z that attains the supremum of ES α in (18) is given by F − Z ( t ) = m + vφ h ∗ ( t ) = m + v (cid:16) − α ( α, ( t ) − (cid:17) q − αα , t ∈ [0 ,

1] a.e.We thus have VaR α ( Z ) = m − v p (1 − α ) / ( α ), and hence Z does not attain sup F Y ∈M ( m,v ) VaR α ( Y ).It follows by the uniqueness of F Z that the supremum of VaR α ( Y ) over F Y ∈ M ( m, v ) cannot beattained. However, the supremum of VaR + α is attained by Z since VaR + α ( Z ) = m + v p (1 − α ) / ( α ). Example 10 (Diﬀerence of two TK distortion riskmetrics) . Take p = 2 and h = h − h to bethe diﬀerence between two inverse-S-shaped functions in (9) with parameters the same as those inExample 4 ( γ = 0 . γ = 0 . h ∗ ] = (cid:18)Z (cid:12)(cid:12) h ∗′ ( t ) − h ∗ (1) (cid:12)(cid:12) d t (cid:19) / = 0 . . By Theorem 2, the worst-case distortion riskmetrics under the uncertainty set M ( m, v ) are givenby sup F Y ∈M ( m,v ) ρ h ( Y ) = sup F Y ∈M ( m,v ) ρ h ∗ ( Y ) = mh (1) + v [ h ∗ ] = 0 . v, (19)and the unique left-quantile function of the random variable Z attaining both supremums above isgiven by F − Z ( t ) = m + vφ h ∗ ( t ) = m + 2 . · h ∗′ (1 − t ) v a.e.We note that the worst-case distortion riskmetrics obtained in (19) are independent of the mean m as h (1) = h (1) − h (1) = 0, which is sensible since ρ h and ρ h ∗ only incorporate the disagreementbetween two distortion riskmetrics. Similarly, we can calculate the inﬁmum of ρ h ( Y ) over Y ∈M ( m, v ), and thus obtain the largest absolute diﬀerence between the two preferences numericallyrepresented by ρ h and ρ h . In this section, we discuss the applications of our main results to some related optimizationproblems commonly investigated in the literature by including the outer problem of (1).

Our equivalence results can be applied to robust portfolio optimization problems. For anuncertainty set f M ⊂ M np with p ∈ [1 , ∞ ], let the random vector X = ( X , . . . , X n ) ∼ F X ∈ f M ,representing the random losses from n risky assets. For A ⊂ R n , denote by a vector a ∈ A theamounts invested in each of the n risky assets. For a distortion function h ∈ H and distortion17iskmetric ρ h : L p → R , we aim to solve the robust portfolio optimization problem given bymin a ∈ A sup F X ∈ f M ρ h ( a ⊤ X ) + β ( a ) ! , (20)where β : R n → R is a penalty function of risk concentration. Note that β is irrelevant for theinner problem of (20). For a general non-concave h , there is no known algorithm to solve theinner problem of (20). The outer optimization problem is also nontrivial in general. Therefore, weusually cannot obtain closed-form solutions of (20) using classical results of optimization problemsfor non-convex risk measures. However, as a direct consequence of our main result in Theorem 1,the following proposition converts (20) to an equivalent convex optimization problem that becomesmuch easier to solve. The proof of Proposition 6 follows directly from Theorem 1. Proposition 6.

For h ∈ H , n ∈ N , A ⊂ R n , and f M ⊂ M n , if the set { F a ⊤ X ∈ M : F X ∈ f M} is closed under concentration for all a ∈ A , then we have min a ∈ A sup F X ∈ f M ρ h ( a ⊤ X ) + β ( a ) ! = min a ∈ A sup F X ∈ f M ρ h ∗ ( a ⊤ X ) + β ( a ) ! . (21) We are also able to solve the preference robust optimization problem with distributional un-certainty. For n ∈ N , an action set A ⊂ R n , a set of plausible distributions f M ⊂ M n , and a set ofpossible probability perceptions G ⊂ H , the problem is formulated as follows:min a ∈ A sup F X ∈ f M sup h ∈G ρ h ( f ( a , X )) . (22)Preference robust optimization refers to the situation when the objective is not completely known,e.g., h is in the set G but not identiﬁed. Therefore, optimization is performed under the worst-casepreference in the set G . Also note that the form sup h ∈G ρ h includes (but is not limited to) allcoherent risk measures via the representation of Kusuoka (2001). For the problem of (22) withoutdistributional uncertainty (thus, only the minimum and the second supremum), see Delage and Li(2018). We have the following result whose proof follows from Theorem 1. Proposition 7.

For all f M ⊂ M n and A ⊂ R n with n ∈ N , if the set M := { F f ( a , X ) ∈ M : F X ∈ f M} (23)18 s closed under concentration for all a ∈ A , then for all G ⊂ H , min a ∈ A sup F X ∈ f M sup h ∈G ρ h ( f ( a , X )) = min a ∈ A sup F X ∈ f M sup h ∈G ρ h ∗ ( f ( a , X )) . (24)The preference robust optimization problem without distributional uncertainty (i.e., problem(22) with only the minimum and the second supremum) is generally diﬃcult to solve when thedistortion function h is not concave. However, when the distribution of the random variable is notcompletely known, we can transfer the original non-convex problem to its convex counterpart using(24), provided that the set of plausible distributions is closed under concentration. Following the discussion in Section 5, we provide several applications of our theoretical resultsto portfolio management for speciﬁc sets of plausible distributions. None of the considered optimiza-tion problems in this section are convex, and we provide numerical calculations or approximationfor the solutions to these optimization problems.

We demonstrate a price competition problem as an application of optimizing the diﬀerencebetween two risk measures shown in Example 10. Similar to the portfolio management problemdiscussed in Section 5.1, we consider n risky assets with random losses X , . . . , X n ∈ L that areonly known to have a ﬁxed mean and a constrained covariance. That is, we choose the set f M = M ( µ , Σ) = { F X ∈ M n : E [ X ] = µ , var( X ) (cid:22) Σ } , for µ ∈ R n and Σ ∈ R n × n positive semideﬁnite. For a ∈ [0 , n , the set of all possible distributionsof aggregate portfolio losses { F a ⊤ X ∈ M : F X ∈ f M} = M a ( µ , Σ) = M (cid:18) a ⊤ µ , (cid:16) a ⊤ Σ a (cid:17) / (cid:19) (25)is closed under concentration as is shown in Example 5. Let ρ h : L → R be an investor’sown price of the portfolio, while ρ h : L → R is her opponent’s price of the same portfolio.We choose h and h to be the inverse-S-shaped distortion functions in (9), with parameters thesame as those in Example 10 ( γ = 0 . γ = 0 . h = h − h . For an action set A = { ( a , . . . , a n ) ∈ [0 , n : P ni =1 a i = 1 } , the investor chooses the optimal a ∗ ∈ A , such that theworst-case overpricing from her opponent is minimized.19rom the calculation of (19), we get D (Σ) := min a ∈ A sup F X ∈ f M (cid:16) ρ h ( a ⊤ X ) − ρ h ( a ⊤ X ) (cid:17) = min a ∈ A sup F Y ∈M a ( µ , Σ) ρ h ∗ ( Y ) = 0 . × min a ∈ A (cid:16) a ⊤ Σ a (cid:17) / . (26)We note that optimizing ρ h − ρ h is generally nontrivial since the diﬀerence between two distortionfunctions h − h is not necessarily monotone, concave, or continuous, even though h and h themselves may have these properties. The generality of our equivalence result allows us to convertthe original problem to the much simpler form (26), which can be solved eﬃciently. Table 1demonstrates the optimal values of a ∗ and D for diﬀerent choices of Σ.Table 1: Optimal results in (26) for diﬀerence between two TK distortion riskmetrics n Σ a ∗ D   (0 . , . , . .  − − − −  (0 . , . , . .   (0 . , . , . .   (0 . , . , . , . , . . Next, we discuss an example of preference robust optimization with distributional uncertaintyusing the results in Sections 4. Similarly to Section 6.1, we consider the set of plausible aggregateportfolio loss distributions M a ( µ , Σ) = { F a ⊤ X ∈ M : F X ∈ M n , E [ X ] = µ , var( X ) (cid:22) Σ } and the action set A = { ( a , . . . , a n ) ∈ [0 , n : P ni =1 a i = 1 } representing the weights the investorassigns to each random loss. The investor considers TK distortion riskmetrics, however, she is notcertain about the parameter γ of the distortion function h . Thus, the investor consider the set ofTK distortion riskmetrics with distortion functions in G = { h ∈ H : h = h γ , γ ∈ [0 . , . } , γ in Wu and Gonzalez (1996). There-fore, the investor aims to ﬁnd a optimal portfolio given the uncertainty in the riskmetrics. Topenalize deviations from the benchmark parameter γ = 0 .

71 (Wu and Gonzalez, 1996), the investoruse the term e c ( γ − . for some c >

0. Since the set M a ( µ , Σ) is closed under concentration forall a ∈ A , Proposition 7, equation (25), and Theorem 2 lead to V ( µ , Σ) := min a ∈ A sup F Y ∈M a ( µ , Σ) sup γ ∈ [0 . , . (cid:16) ρ h γ ( Y ) − e c ( γ − . (cid:17) = min a ∈ A sup F Y ∈M (cid:16) a ⊤ µ , ( a ⊤ Σ a ) / (cid:17) sup γ ∈ [0 . , . (cid:16) ρ ( h γ ) ∗ ( Y ) − e c ( γ − . (cid:17) = min a ∈ A sup γ ∈ [0 . , . (cid:18) a ⊤ µ + (cid:16) a ⊤ Σ a (cid:17) / [( h γ ) ∗ ] − e c ( γ − . (cid:19) . (27)We calculate the optimal values V for diﬀerent choices of parameters ( n , c , µ and Σ) andreport them in Table 2, where a ∗ and ˆ γ represent the optimal weights and the parameters of theinverse-S-shaped distortion function, respectively. Note that the last optimization problem in (27)can be calculated numerically.Table 2: Optimal values in (27) for TK distortion riskmetrics n c µ Σ a ∗ ˆ γ V , ,   (0 . , . , . .

610 1 .

413 30 (2 , ,   (0 . , . , . .

676 1 .

293 30 (1 , ,  − − − −  (0 . , . , . .

690 1 .

173 30 (1 . , ,   (0 . , . , . .

630 1 .

575 30 (1 , , , ,   (0 . , . , . , . , . .

678 1 . A special case of the portfolio optimization problem introduced in Section 5.1, which is ofinterest in robust risk aggregation (see e.g., Blanchet et al. (2020)), is to take f M to be the Fr´echet The aggregate least-square estimate of γ in Section 5 of Wu and Gonzalez (1996) is 0 .

71 with standard deviation0 . M ( F , . . . , F n ) = { F X ∈ M n : X i ∼ F i , i = 1 , . . . , n } , (28)for some known marginal distributions F , . . . , F n ∈ M . In this case, although the left-hand sideof (21) is generally diﬃcult to solve, for A ⊂ R n + , the right-hand side of (21) can be rewritten usingconvexity and comonotonicity asmin a ∈ A (cid:16) a ⊤ ( ρ h ∗ ( X ) , . . . , ρ h ∗ ( X n )) + β ( a ) (cid:17) , (29)where X i ∼ F i , i = 1 , . . . , n . We see that (29) is a linear optimization problem with a penalty β ,which often admits closed-form solutions when β is properly chosen. For any given a ∈ A , we deﬁne M a ( F , . . . , F n ) = { F a ⊤ X ∈ M : X i ∼ F i , i = 1 , . . . , n } . (30)The set M a ( F , . . . , F n ) is the weighted version of M S ( F , . . . , F n ) in Example 5. Note that M a ( F , . . . , F n ) is generally neither closed under concentration nor closed under conditional ex-pectation. However, M a ( F , . . . , F n ) is asymptotically (for large n ) similar to a set of distributionsclosed under concentration; see Theorem 3.5 of Mao and Wang (2015) for a precise statement in thecase of equal weights and identical marginal distributions. Therefore, even though M a ( F , . . . , F n )is not closed under concentration for some a ∈ A , our result of the problem (29) is a good ap-proximation of the original problem for large n . Such asymptotic equivalence between worst-caseriskmetrics of aggregate risks with equal weights has already been well studied in the literature;see e.g., Theorem 3.3 of Embrechts et al. (2015) for the VaR/ES pair and Theorem 3.5 of Cai et al.(2018) for distortion risk measures.We conduct numerical calculations to illustrate the equivalence between both sides in (21). Wechoose the action set A a,b = { ( x , . . . , x n ) ∈ [ a, b ] n : P ni =1 x i = 1 } , for 0 a < /n < b β to be the L -norm multiplied by a scaler c >

0, namely c || · || , where the scaler c is a tuning parameter of the L penalty. We ﬁrst solve the optimization problems separately forthe well-known VaR/ES pair at the level of 0 .

95. Speciﬁcally, the two problems are given by V VaR ( a, b, F , . . . , F n ) = min a ∈ A a,b sup F X ∈M ( F ,...,F n ) VaR . ( a ⊤ X ) + c || a || ! , (31) V ES ( a, b, F , . . . , F n ) = min a ∈ A a,b sup F X ∈M ( F ,...,F n ) ES . ( a ⊤ X ) + c || a || ! = min a ∈ A a,b (cid:16) a ⊤ (ES . ( F ) , . . . , ES . ( F n )) + c || a || (cid:17) , (32)where the true value of the original VaR problem is approximated by the rearrangement algorithm(RA) of Puccetti and R¨uschendorf (2012) and Embrechts et al. (2013), whereas the optimal valueof the ES problem is obtained by simultaneously minimizing the sum of a linear combination of ESand the L -norm of the vector a , which can be done eﬃciently. In particular, if the marginals of therandom losses are identical (i.e., F = · · · = F n = F ), the optimal solution is a ∗ = (1 /n, . . . , /n )22nd V ES ( a, b, F , . . . , F n ) = ES . ( F ) + c/ √ n . We consider the following marginal distributions(i) F i follows a Pareto distribution with scale parameter 1 and shape parameter 3 + ( i − / ( n − i = 1 , . . . , n ;(ii) F i is normally distributed with parameters N(1 , i − / ( n − i = 1 , . . . , n ;(iii) F i follows an exponential distribution with parameter 1 + ( i − / ( n − i = 1 , . . . , n .We choose n to be 3, 10, and 20. For comparison, we calculate the value n || ∆ a ∗ || , where ∆ a ∗ is the diﬀerence between the optimal weights of the non-convex problem and the convex problem.In addition, we calculate the absolute diﬀerences between the optimal values obtained by the twoproblems, ∆ V = V ES − V VaR >

0, and the percentage diﬀerences ∆

V /V

VaR . Tables 3 and 4 showthe numerical results that compare both optimization problems with two choices of the action sets A a,b . The computation time is reported (in seconds). We observe that the optimal values obtainedin the two problems get closer and become approximately the same as n gets larger. As explainedbefore, this is because the set of plausible distributions M ( F , . . . , F n ) is asymptotically equal to aset closed under concentration.Next, we consider a TK distortion riskmetric with parameter γ = 0 .

7. Due to the non-concavityof h , there are no known ways of directly solving the non-convex optimization problemmin a ∈ A a,b sup F X ∈M ( F ,...,F n ) ρ h ( a ⊤ X ) + c || a || ! . (33)We may get an approximation of (33) using a lower bound of ρ h in (33) produced with the depen-dence structure created by the rearrangement algorithm (RA); for simplicity, we denote by V h thislower bound. On the other hand, by (21), the convex counterpart of (33) can be written (usingTheorem 1) as V h ∗ ( a, b, F , . . . , F n ) = min a ∈ A a,b sup F X ∈M ( F ,...,F n ) ρ h ∗ ( a ⊤ X ) + c || a || ! = min a ∈ A a,b (cid:16) a ⊤ ( ρ h ∗ ( X ) , . . . , ρ h ∗ ( X n )) + c || a || (cid:17) , (34)where X i ∼ F i for i = 1 , . . . , n . We calculate the absolute diﬀerences between the optimal values ofthe convex and non-convex problems ∆ V = V h ∗ − V h >

0, and the percentage diﬀerences ∆

V /V h .Tables 5 and 6 compare the numerical results of the two optimization problems with diﬀerentchoices of A a,b . We observe that the percentage diﬀerences between the RA lower bound V h forthe non-convex problem (33) and the minimum value V h ∗ of the convex problem (34) are roughlybetween 10% to 20%. Note that the RA lower bound is not expected to be very close to the true The processors we use are Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2.59GHz (2 processors). Such a dependence structure obviously provides a lower bound for the worst-case value in (33). In theory, theresult from RA is thus not an optimal dependence structure for (33). In our numerical results, this lower bound isvery close to an upper bound only for the case of VaR and ES but not for the case of TK distortion riskmetrics. . and ES . with a = 0 and b = 1 c V VaR time V ES time n || ∆ a ∗ || ∆ V ∆ V /V

VaR (%)(i)Pareto n = 3 2 . .

547 31 .

53 3 .

741 0 .

72 8 . × − .

194 5 . n = 10 3 . .

197 153 .

83 3 .

215 1 .

39 9 . × − . . n = 20 4 . .

156 424 .

17 3 .

159 9 .

37 3 . × − . × − . n = 3 4 . .

766 31 .

19 5 .

785 0 .

18 1 . × − . . n = 10 2 . .

082 97 .

30 4 .

083 0 .

77 1 . × − . × − . × − n = 20 3 . .

132 431 .

79 4 .

132 4 .

66 2 . × − . × − . × − (iii)Exp n = 3 3 . .

251 26 .

78 4 .

405 0 .

07 0 .

331 0 .

155 3 . n = 10 4 . .

892 118 .

23 3 .

893 0 .

50 9 . × − . × − . × − n = 20 7 . .

230 543 .

03 4 .

230 3 .

47 3 . × − . × − . × − Table 4: Comparison of the numerical results of the two optimization problems (31) and (32) forVaR . and ES . with a = 1 / (2 n ) and b = 2 /n c V VaR time V ES time n || ∆ a ∗ || ∆ V ∆ V /V

VaR (%)(i)Pareto n = 3 2 . .

546 54 .

59 3 .

741 0 .

19 6 . × − .

194 5 . n = 10 3 . .

204 146 .

63 3 .

220 1 .

60 1 . × − . . n = 20 4 . .

162 847 .

13 3 .

163 10 .

08 1 . × − . × − . n = 3 4 . .

766 57 .

31 5 .

785 0 .

19 1 . × − . . n = 10 2 . .

084 166 .

25 4 .

084 0 .

79 0 2 . × − . × − n = 20 3 . .

133 691 .

91 4 .

133 5 .

91 0 1 . × − . × − (iii)Exp n = 3 3 . .

369 48 .

58 4 .

422 0 .

09 1 . × − . . n = 10 4 . .

916 115 .

18 3 .

916 0 .

50 2 . × − . × − . × − n = 20 7 . .

236 665 .

05 4 .

236 3 .

48 2 . × − . × − . × − We introduced the concept of closedness under concentration, which is, in the context of distri-butional uncertainty, a suﬃcient condition to transform an optimization problem with a non-convexdistortion riskmetric to its convex counterpart. Many sets of plausible distributions commonlyused in the literature of ﬁnance, optimization, and risk management are closed under concentra-tion. Moreover, by focusing on distortion riskmetrics whose distortion functions are not necessarilymonotone, concave, or continuous, we are able to solve optimization problems for the class of func-24able 5: Comparison of the numerical results of the two optimization problems (33) and (34) forTK distortion riskmetrics with a = 0 and b = 1 c V h time V h ∗ time n || ∆ a ∗ || ∆ V ∆ V /V h (%)(i)Pareto n = 3 1 . .

076 144 .

75 1 .

185 0 .

23 0 .

488 0 .

109 10 . n = 10 2 . .

047 220 .

03 1 .

237 1 .

42 0 0 .

190 18 . n = 20 4 . .

301 826 .

64 1 .

501 8 .

24 0 0 .

200 15 . n = 3 0 . .

240 60 .

76 1 .

493 0 .

16 0 . .

253 20 . n = 10 0 . .

141 246 .

31 1 .

363 0 .

72 1 .

28 0 .

222 19 . n = 20 0 . .

103 1503 .

35 1 .

316 2 .

80 1 .

78 0 .

213 19 . n = 3 1 . .

305 49 .

79 1 .

427 0 .

23 0 .

360 0 .

122 9 . n = 10 2 . .

313 198 .

43 1 .

484 1 .

62 0 .

184 0 .

171 13 . n = 20 2 . .

120 850 .

12 1 .

286 10 .

91 0 .

158 0 .

166 14 . Table 6: Comparison of the numerical results of the two optimization problems (33) and (34) forTK distortion riskmetrics with a = 1 / (2 n ) and b = 2 /n c V h time V h ∗ time n || ∆ a ∗ || ∆ V ∆ V /V h (%)(i)Pareto n = 3 1 . .

077 73 .

21 1 .

185 0 .

25 0 .

469 0 .

109 10 . n = 10 2 . .

047 248 .

38 1 .

237 2 .

29 0 .

378 0 .

191 18 . n = 20 4 . .

301 638 .

24 1 .

501 12 .

21 0 0 .

200 15 . n = 3 0 . .

240 179 .

68 1 .

493 0 .

19 0 . .

253 20 . n = 10 0 . .

146 389 .

97 1 .

363 0 .

76 0 .

660 0 .

217 19 . n = 20 0 . .

103 1563 .

84 1 .

316 3 .

39 1 .

63 0 .

213 19 . n = 3 1 . .

304 52 .

66 1 .

430 0 .

25 0 .

107 0 .

126 9 . n = 10 2 . .

312 236 .

15 1 .

485 2 .

27 0 .

214 0 .

172 13 . n = 20 2 . .

119 879 .

73 1 .

289 10 .

10 0 .

141 0 .

170 15 . tionals larger than classical risk measures or deviation measures. In particular, we are able toobtain bounds on diﬀerences between two distortion riskmetrics, which may represent measuresof disagreement between two utilities/risk attitudes. Our result can also be applied to solve thepopular problem of optimizing risk measures under moment constraints. In particular, we obtainthe worst- and best-case distortion riskmetrics when the underlying random variable has a ﬁxedmean and bounded p -th moment.We demonstrate the applicability of our result by numerically calculating the solution to op-timizing the diﬀerence between risk measures, preference robust optimization and portfolio opti-mization under marginal constraints. In all numerical examples, the original non-convex problemis converted or well approximated by a convex one which can be solved eﬃciently.Our condition of closedness under concentration in Theorem 1 is a suﬃcient but not necessarycondition for the equivalence of a non-convex and a convex optimization problem under distribu-tional uncertainty. A question that remains unanswered is a necessary and suﬃcient conditionsuch that the desired equivalence holds. Pinning down such a condition would facilitate many more25pplications in decision theory, ﬁnance, game theory, and operations research. Acknowledgments

SMP would like to acknowledge the support of the Natural Sciences and Engineering ResearchCouncil of Canada with funding reference numbers DGECR-2020-00333 and RGPIN-2020-04289.RW acknowledges ﬁnancial support from the Natural Sciences and Engineering Research Councilof Canada (RGPIN-2018-03823, RGPAS-2018-522590).

References

Acerbi, C. (2002). Spectral measures of risk: A coherent representation of subjective risk aversion.

Journalof Banking and Finance , (7), 1505–1518.Armbruster, B. and Delage, E. (2015). Decision making under uncertainty when preference information isincomplete. Management Science , (1), 111–128.Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999). Coherent measures of risk. Mathematical Finance , (3), 203–228.Bernard, C., Pesenti, S. M. and Vanduﬀel, S. (2020). Robust distortion risk measures. SSRN : .Blanchet, J., Lam, H., Liu, Y. and Wang, R. (2020). Convolution bounds on quantile aggregation. arXiv: .Blanchet, J. and Murthy, K. (2019). Quantifying distributional model risk via optimal transport. Mathematicsof Operations Research , (2), 565–600.Brighi, B. and Chipot, M. (1994). Approximated convex envelope of a function. SIAM Journal on NumericalAnalysis , , 128–148.Cai, J., Li, J. and Mao, T. (2020). Distributionally robust optimization under distorted expectations. SSRN : .Cai, J., Liu, H. and Wang, R. (2018). Asymptotic equivalence of risk measures under dependence uncertainty. Mathematical Finance , (1), 29–49.Cornilly, D., R¨uschendorf, L. and Vanduﬀel, S. (2018). Upper bounds for strictly concave distortion riskmeasures on moment spaces. Insurance: Mathematics and Economics , , 141–151.Delage, E., Arroyo, S. and Ye, Y. (2014). The value of stochastic modeling in two-stage stochastic programswith cost uncertainty. Operations Research , (6), 1377–1393.Delage, E. and Li, Y. (2018). Minimizing risk exposure when the choice of a risk measure is ambiguous. Management Science , (1), 327–344.Delage, E. and Ye, Y. (2010). Distributionally robust optimization under moment uncertainty with applica-tion to data-driven problems. Operations Research , (3), 595–612.Embrechts, P., Puccetti, G. and R¨uschendorf, L. (2013). Model uncertainty and VaR aggregation. Journalof Banking and Finance , (8), 2750–2764.Embrechts, P., Wang, B. and Wang, R. (2015). Aggregation-robustness and model uncertainty of regulatoryrisk measures. Finance and Stochastics , (4), 763–790.F¨ollmer, H. and Schied, A. (2002). Convex measures of risk and trading constraints. Finance and Stochastics , (4), 429–447. uo, S. and Xu, H. (2020). Statistical robustness in utility preference robust optimization models. Mathe-matical Programming Series A , published online.Huber, P. J. and Ronchetti E. M. (2009).

Robust Statistics.

Second Edition, Wiley Series in Probability andStatistics. Wiley, New Jersey.Kusuoka, S. (2001). On law invariant coherent risk measures.

Advances in Mathematical Economics , , 83–95.Li, L., Shao, H., Wang, R. and Yang, J. (2018). Worst-case Range Value-at-Risk with partial information. SIAM Journal on Financial Mathematics , (1), 190–218.Li, Y. (2018). Closed-form solutions for worst-case law invariant risk measures with application to robustportfolio optimization. Operations Research , (6), 1457–1759.Liu, F., Cai, J., Lemieux, C. and Wang, R. (2020). Convex risk functionals: Representation and applications. Insurance: Mathematics and Economics , , 66–79.Mao, T., Wang, B. and Wang, R. (2019). Sums of uniform random variables. Journal of Applied Probability , (3), 918–936.Mao, T. and Wang, R. (2015). On aggregation sets and lower-convex sets. Journal of Multivariate Analysis , , 170–181.McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques andTools . Revised Edition. Princeton, NJ: Princeton University Press.Natarajan, K., Pachamanova, D. and Sim, M. (2008). Incorporating asymmetric distributional informationin robust value-at-risk optimization.

Management Science , (3), 573–585.Popescu, I. (2007). Robust mean-covariance solutions for stochastic optimization. Operations Research , (1),98–112.Puccetti, G. and R¨uschendorf, L. (2012). Computation of sharp bounds on the distribution of a function ofdependent risks. Journal of Computational and Applied Mathematics , (7), 1833–1840.Rockafellar, R. T., Uryasev, S. and Zabarankin, M. (2006). Generalized deviation in risk analysis. Financeand Stochastics , , 51–74.Shaked, M. and Shanthikumar, J. G. (2007). Stochastic Orders . Springer Series in Statistics.Tversky, A. and Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of Uncer-tainty.

Journal of Risk and Uncertainty , (4), 297–323.Wang, B. and Wang, R. (2016). Joint mixability. Mathematics of Operations Research , (3), 808–826.Wang, Q., Wang, R. and Wei, Y. (2020a). Distortion riskmetrics on general spaces. ASTIN Bulletin , (4),827–851.Wang, R., Bignozzi, V. and Tsakanas, A. (2015). How superadditive can a risk measure be? SIAM Jounralon Financial Mathematics , , 776–803.Wang, R., Xu, Z. Q. and Zhou, X. Y. (2019). Dual utilities on risk aggregation under dependence uncertainty. Finance and Stochastics , (4), 1025–1048.Wang, R., Wei, Y. and Willmot, G. E. (2020b). Characterization, robustness and aggregation of signedChoquet integrals. Mathematics of Operations Research , (3), 993–1015.Wang, S., Young, V. R. and Panjer, H. H. (1997). Axiomatic characterization of insurance prices. Insurance:Mathematics and Economics , (2), 173–183.Wiesemann, W., Kuhn, D. and Sim, M. (2014). Distributionally robust convex optimization. OperationsResearch , (6), 1203–1466. u, G. and Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science , (12), 1676–1690.Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica , (1), 95–115.Zhu, W. and Shao, H. (2018). Closed-form solutions for extreme-case distortion risk measures and applicationsto robust portfolio management. SSRN : .Zhu, S. and Fukushima, M. (2009). Worst-case conditional value-at-risk with application to robust portfoliomanagement. Operations Research , (5), 1155–1168.Zymler, S., Kuhn, D. and Rustem, B. (2013). Distributionally robust joint chance constraints with second-order moment information. Mathematical Programming , (1–2), 167–198. A Proofs

A.1 Proof of Proposition 1

Proof of Proposition 1.

Note that (ˆ h ) ∗ = h ∗ = ˆ h = h on 0 and 1. For all t ∈ (0 , h ) ∗ ( t ) > ˆ h ( t ) > h ( t ), we have (ˆ h ) ∗ ( t ) > h ∗ ( t ). On the other hand, we have h ∗ ( t ) > h ( t + ) for t ∈ (0 , h ∗ ( t ) < h ( t +0 ) for some t ∈ (0 , h ∗ ( t + ǫ ) < h ( t + ǫ ) for some ǫ > h ∗ ( t ) > h ( t − ) for t ∈ (0 , h ∗ > h on (0 , h ∗ > ˆ h on (0 , h ∗ > (ˆ h ) ∗ on (0 , h ) ∗ = h ∗ on[0 , { t ∈ [0 ,

1] : ˆ h ( t ) = h ∗ ( t ) } is a union of disjoint sets that are notsingletons. To show this assertion, assume that the converse is true. There exists x ∈ (0 , h ( x ) < h ∗ ( x ) and ˆ h ( t ) = h ∗ ( t ) on t ∈ ( x − ǫ, x ) ∪ ( x, x + ǫ ) for some 0 < ǫ x ∧ (1 − x ). It isclear that x ∈ J h . Since h ∗ is continuous on ( x − ǫ, x + ǫ ), we haveˆ h ( x ) < h ∗ ( x ) = h ∗ ( x + ) = ˆ h ( x + ) . This contradicts (8). Therefore, the set { t ∈ [0 ,

1] : ˆ h ( t ) = h ∗ ( t ) } is the union of some disjointintervals, denoted by ∪ l ∈ L A l for some L ⊂ N . For all l ∈ L , we denote the left and right endpointsof A l by a l and b l , respectively, with a l < b l . Deﬁne a function via linear interpolation h c ( t ) = ( ˆ h ( a l ) + ˆ h ( b l ) − ˆ h ( a l ) b l − a l ( t − a l ) , t ∈ A l , l ∈ L, ˆ h ( t ) , otherwise . It is clear that h c h ∗ and h c is continuous on (0 , h c = h ∗ on ∪ l ∈ L A l .Suppose for the purpose of contradiction that h c = h ∗ on ∪ l ∈ L A l . Since h c < h ∗ for some point in ∪ l ∈ L A l , there exists x ∈ A l for some l ∈ L such that h c ( x ) < ˆ h ( x ). Thus we can take a point( x , ˆ h ( x )) ∈ (0 , × R with ˆ h ( x ) > h c ( x ), which has the largest perpendicular distance to the28traight line h c ( t ) = ˆ h ( a l ) + ˆ h ( b l ) − ˆ h ( a l ) b l − a l ( t − a l ), namely, x = arg max x ∈ A l ˆ h ( x ) >h c ( x ) ( b l − a l )ˆ h ( x ) − (ˆ h ( b l ) − ˆ h ( a l )) x − ( b l − a l )ˆ h ( a l ) + (ˆ h ( b l ) − ˆ h ( a l )) a l (cid:16) (ˆ h ( b l ) − ˆ h ( a l )) + ( b l − a l ) (cid:17) / . The existence of the maximizer x is due to the upper semicontinuity of ˆ h . There exists a function g with g = h ∗ on [0 , \ A l and g ( x ) = ˆ h ( x ), such that g is concave and ˆ h g h ∗ on [0 , h ∗ > ˆ h on A l , we have h ∗ ( x ) > ˆ h ( x ) = g ( x ). Thus h ∗ cannot be the concave envelope ofˆ h , which leads to a contradiction. Thus, h ∗ = h c on ∪ l ∈ L A l . Since h ∗ = ˆ h = h c on (0 , \ ( ∪ l ∈ L A l ),we have h ∗ = h c . Therefore, { t ∈ [0 ,

1] : ˆ h ( t ) = h ∗ ( t ) } is a union of disjoint open intervals, and h ∗ is linear on each of the intervals. A.2 Proof of Proposition 4

Proof of Proposition 4. (i) Suppose that M is closed under concentration and I is a ﬁnite. Us-ing Proposition 2, we can see that F I is the resulting distribution obtained by sequentiallyapplying ﬁnitely many C -concentrations to F over all C ∈ I . We thus have F I ∈ M for all F ∈ M .(ii) Suppose that M is closed under conditional expectation and F ∈ M . We deﬁne X = F − ( U ) { U S C ∈I C } + X C ∈I E [ F − ( U ) | U ∈ C ] { U ∈ C } , whose left-quantile function is given by (14) according to Proposition 2. Following similarargument to the proof of Proposition 3, for all σ ( X )-measurable random variables Z , we have E [ XZ ] = E [ ZF − ( U ) { U S C ∈I C } + X C ∈I Z E [ F − ( U ) | U ∈ C ] { U ∈ C } ]= E [ ZF − ( U ) { U S C ∈I C } ] + X C ∈I E [ E [ ZF − ( U ) | U ∈ C ] { U ∈ C } ]= E [ ZF − ( U ) { U S C ∈I C } ] + X C ∈I E [ ZF − ( U ) { U ∈ C } ] = E [ ZF − ( U )] . Thus E [ F − ( U ) | X ] = X , P -almost surely, which implies that F I = F X ∈ M . A.3 Proof of Theorem 1

Proof of Theorem 1.

We will prove Theorem 1 in two main steps. First we show that the theoremholds if I h is ﬁnite and h has ﬁnitely many discontinuity points. Then we discuss general h with apossibly inﬁnite I h . Finite case:

Here we prove (15) under the case where I h is ﬁnite and h has ﬁnitely manydiscontinuity points (i.e. J h in (7) is a ﬁnite set). We will ﬁrst show that, assuming that M is29losed under concentration within I h , we havesup F X ∈M ρ ˆ h ( X ) = sup F X ∈M ρ h ∗ ( X ) . (A.1)After proving (A.1), we show the three statements in Theorem 1 in the order (ii), (i), and (iii).For h ∈ H , suppose that M is closed under concentration within I h . Take an arbitraryrandom variable Y with F Y ∈ M . Let G = F I h Y . For h ∈ H , write functions g ( t ) = 1 − ˆ h (1 − t ) and g ∗ ( t ) = 1 − h ∗ (1 − t ) for t ∈ [0 , I h , g = g ∗ on each set in I h and g = g ∗ on othersets. For any ( a, b ) ∈ I h , we have G − ( t ) = R ba F − Y ( u ) d ub − a for all t ∈ ( a, b ] and G − ( t ) = R ba F − Y ( u ) d ub − a for all t ∈ [ a, b ). Using the fact that g ∗ is linear on ( a, b ) and g ( t ) = g ∗ ( t ) for t = a, b , we have Z ( a,b ) F − Y ( t ) d g ∗ ( t ) = ( g ∗ ( b ) − g ∗ ( a )) R ba F − Y ( t ) d tb − a = ( g ( b ) − g ( a )) R ba F − Y ( t ) d tb − a = Z ( a,b ] G − ( t ) d g ( t ) + G − ( a )( g ( a + ) − g ( a )) . (A.2)Deﬁne the sets J + = { t ∈ J h : ˆ h ( t + ) = ˆ h ( t ) = ˆ h ( t − ) } , J − = { t ∈ J h : ˆ h ( t − ) = ˆ h ( t ) = ˆ h ( t − ) } , and J = { t ∈ J h : ˆ h ( t + ) = ˆ h ( t ) = ˆ h ( t − ) } . To better understand these sets, we recall Figure 1 (without concave envelopes) as Figure A.1 todemonstrate an example of a distortion function h , the corrresponding ˆ h , the sets J h , J + , J − , and J , and the sets ˆ J , ˆ J + , ˆ J − , ˆ J , and ˆ J − (deﬁned in the proof of (i) below).Figure A.1: An example of h (left) and ˆ h (right); in this ﬁgure, J h = { t , t , t , t , t } , J + = { t } , J − = { t , t } , and J = { t } . Moreover, the sets we use in the proof of (i) are ˆ J = { t , t , t , t } ,ˆ J + = { t , t } , ˆ J − = { t , t } , ˆ J = { t } , and ˆ J − = { t } Z I h ∼ F I h Y , we have ρ ˆ h ( Z I h ) = Z (0 , \ ( J + ∪ J ) G − ( t ) d g ( t ) + X t ∈ J + ∪ J ∪{ } G − ( t )( g ( t + ) − g ( t )) . Hence using (A.2) and (14), we get ρ h ∗ ( Y ) − ρ ˆ h ( Z I h )= Z F − Y ( t ) d g ∗ ( t ) + F − Y (0)( g ∗ (0 + ) − g ∗ (0)) − Z (0 , \ ( J + ∪ J ) G − ( t ) d g ( t ) − X t ∈ J + ∪ J ∪{ } G − ( t )( g ( t + ) − g ( t ))= X ( a,b ) ∈I h Z ( a,b ) F − Y ( t ) d g ∗ ( t ) − Z ( a,b ] G − ( t ) d g ( t ) − G − ( a )( g ( a + ) − g ( a )) ! = 0 . (A.3)Since I h is ﬁnite, we have F I h Y ∈ M by Proposition 4. Thus we have ρ h ∗ ( Y ) = ρ ˆ h ( Z I h ) sup F X ∈M ρ ˆ h ( X ) , which gives our desired equality (A.1) since ρ h ∗ = ρ (ˆ h ) ∗ > ρ ˆ h .Proof of (ii): Using h = ˆ h and (A.1), we have sup F X ∈M ρ h ( X ) = sup F X ∈M ρ h ∗ ( X ).Proof of (i): Suppose that M is closed under concentration. It directly implies that M isclosed under concentration within I h . Therefore, (A.1) holds for all h ∈ H . Next, we need to showthat sup F X ∈M ρ h ( X ) = sup F X ∈M ρ ˆ h ( X ) . Deﬁneˆ J = { t ∈ J h : ˆ h ( t ) = h ( t ) } , ˆ J + = { t ∈ ˆ J : ˆ h ( t ) = ˆ h ( t + ) } , and ˆ J − = ˆ J \ ˆ J + . For n >

0, write intervals A ns = ( (1 − s − / √ n, − s + 1 /n ) , s ∈ ˆ J − , (1 − s − /n, − s + 1 / √ n ) , s ∈ ˆ J + . Let I n = { A ns : s ∈ ˆ J } . Note that h ∈ H has ﬁnitely many discontinuity points. Thus the intervalsin I n are disjoint when n is large enough. For all F Y ∈ M and Y ∼ F Y , we deﬁne Z I n = F − Y ( U ) { U / ∈∪ s ∈ ˆ J A ns } + X s ∈ ˆ J E [ F − Y ( U ) | U ∈ A ns ] { U ∈ A ns } . It follows that Z I n ∼ F I n Y and the right-quantile function of Z I n , denoted by G − n , is given by the31ight-continuous adjusted version of F − Y ( t ) { t/ ∈∪ s ∈ ˆ J A ns } + X s ∈ ˆ J R A ns F − Y ( u ) d uλ ( A ns ) { t ∈ A ns } , t ∈ (0 , . Thus we get lim n →∞ G − n (1 − t ) = ( F − Y (1 − t ) , t ∈ ˆ J − ,F − Y (1 − t ) , otherwise . Similarly, if we denote the left-quantile function of Z I n by G − n , then G − n is given by the left-continuous version of F − Y ( t ) { t/ ∈∪ s ∈ ˆ J A ns } + X s ∈ ˆ J R A ns F − Y ( u ) d uλ ( A ns ) { t ∈ A ns } . It follows that lim n →∞ G − n (1 − t ) = ( F − Y (1 − t ) , t ∈ ˆ J + ,F − Y (1 − t ) , otherwise . Deﬁne, further, the setsˆ J = { t ∈ ˆ J + : h ( t ) = h ( t − ) } and ˆ J − = { t ∈ ˆ J − : h ( t ) = h ( t + ) } . For u ∈ [0 , h − ( u ) = X t ∈ ˆ J − ( h ( t ) − h ( t − )) { u > t } , h − ( u ) = X t ∈ ˆ J − ( h ( t + ) − h ( t )) { u>t } ,h + ( u ) = X t ∈ ˆ J + ( h ( t + ) − h ( t )) { u>t } , h ( u ) = X t ∈ ˆ J ( h ( t ) − h ( t − )) { u > t } , ˆ h − ( u ) = X t ∈ ˆ J − ( h ( t + ) − h ( t − )) { u>t } , ˆ h + ( u ) = X t ∈ ˆ J + ( h ( t + ) − h ( t − )) { u > t } , and h ( u ) = h ( u ) − h + ( u ) − h − ( u ) − h ( u ) − h − ( u ) = ˆ h ( u ) − ˆ h + ( u ) − ˆ h − ( u ) . Note that | Z I n − F − Y ( U ) | = 0 when U / ∈ ∪ s ∈ ˆ J A ns and 0 , ∈ [0 , \ ∪ s ∈ ˆ J A ns . We have | Z I n − − Y ( U ) | < ∞ . Therefore, by the dominated convergence theorem,lim n →∞ ( ρ h − ( Z I n ) + ρ h − ( Z I n ))= lim n →∞ Z G − n (1 − u ) d h − ( u ) + lim n →∞ Z G − n (1 − u ) d h − ( u )= X t ∈ ˆ J − F − Y (1 − t )( h ( t ) − h ( t − )) + X t ∈ ˆ J − F − Y (1 − t )( h ( t + ) − h ( t ))= X t ∈ ˆ J − \ ˆ J − F − Y (1 − t )( h ( t ) − h ( t − )) + X t ∈ ˆ J − F − Y (1 − t )( h ( t ) − h ( t − ) + h ( t + ) − h ( t ))= X t ∈ ˆ J − \ ˆ J − F − Y (1 − t )( h ( t + ) − h ( t − )) + X t ∈ ˆ J − F − Y (1 − t )( h ( t + ) − h ( t − )) = ρ ˆ h − ( Y ) . Similarly, we get lim n →∞ ( ρ h + ( Z I n ) + ρ h ( Z I n )) = ρ ˆ h + ( Y ). On the other hand, it is clear thatlim n →∞ ρ h ( Z I n ) = ρ h ( Y ). Therefore, we havelim n →∞ ρ h ( Z I n ) = lim n →∞ ( ρ h − ( Z I n ) + ρ h − ( Z I n ) + ρ h + ( Z I n ) + ρ h ( Z I n ) + ρ h ( Z I n ))= ρ ˆ h − ( Y ) + ρ ˆ h + ( Y ) + ρ h ( Y ) = ρ ˆ h ( Y ) . Thus we have ρ ˆ h ( Y ) = lim n →∞ ρ h ( Z I n ) sup F X ∈M ρ h ( X ) . (A.4)Using (A.1) and (A.4), we getsup F X ∈M ρ h ∗ ( X ) = sup F X ∈M ρ ˆ h ( X ) sup F X ∈M ρ h ( X ) . Proof of (iii): For all h ∈ H , if M is closed under conditional expectation and h = ˆ h , usingProposition 4, we have F I h Y ∈ M . Since Z I h ∼ F I h Y , (A.3) gives ρ h ∗ ( Y ) = ρ ˆ h ( Z I h ) = ρ h ( Z I h ) . Note that ρ h ρ h ∗ generally. Therefore, if max F Y ∈M ρ h ∗ ( Y ) is attained by F Y , then so ismax F Y ∈M ρ h ( Y ) by F I h Y . Obviously, these two quantities share a common maximizer F I h Y because ρ h ∗ ( Z I h ) max F Y ∈M ρ h ∗ ( Y ) = max F Y ∈M ρ h ( Y ) = ρ h ( Z I h ) ρ h ∗ ( Z I h ) . General case:

We prove Theorem 1 for all general h ∈ H where I h or the number ofdiscontinuity points of h is countable.(i) If I h is countable, it suﬃces to prove (A.1). We write I h as the collection of ( a i , b i ) for i ∈ N I n = { ( a i , b i ) : i = 1 , . . . , n } for all n ∈ N . Deﬁne the function h n ( t ) = ( h ∗ ( t ) , t ∈ (1 − b i , − a i ) , i = 1 , . . . , n, ˆ h ( t ) , otherwise . It is clear that for all n ∈ N , the set { t ∈ [0 ,

1] : h n ( t ) = ˆ h ( t ) } is a ﬁnite union of disjoint openintervals and h n is linear on each of the intervals. For all random variables Y with F Y ∈ M ,let random variable Z I n ∼ F I n Y . Similar to (A.1), we have ρ h n ( Y ) = ρ ˆ h ( Z I n ) sup F X ∈M ρ ˆ h ( X ) , for all n ∈ N . Note that h n ( t ) ↑ h ∗ ( t ) as n → ∞ for all t ∈ (0 , ρ h n ( Y ) → ρ h ∗ ( Y ) as n → ∞ . It follows thatsup F X ∈M ρ ˆ h ( X ) > ρ h n ( Y ) n →∞ −−−→ ρ h ∗ ( Y ) . (ii) If h ∈ H has countably many discontinuity points, it suﬃces to prove (A.4). There exist seriesof ﬁnite sets { ˆ J m } m ∈ N ⊂ ˆ J , such that ˆ J m → ˆ J as m → ∞ . For all m ∈ N , writeˆ h m ( t ) = ( ˆ h ( t ) , t ∈ ˆ J m ,h ( t ) , otherwise , and deﬁne ˆ J m + = { t ∈ ˆ J m : ˆ h m ( t ) = ˆ h m ( t + ) } , and ˆ J m − = ˆ J m \ ˆ J m + . For n >

0, let I n,m = { B n,ms : i ∈ ˆ J m } with B n,ms = ( (1 − s − / √ n, − s + 1 /n ) , s ∈ ˆ J m − , (1 − s − /n, − s + 1 / √ n ) , s ∈ ˆ J m + . Following the same argument as (A.4), for all random variable Y with F Y ∈ M , we havesup F X ∈M ρ h ( X ) > ρ h ( Z I n,m ) n →∞ −−−→ ρ ˆ h m ( Y ) , for all m ∈ N , where Z I n,m ∼ F I n,m Y . Moreover, we have ˆ h m ( t ) ↑ ˆ h ( t ) for all t ∈ [0 ,

1] as m → ∞ . By themonotone convergence theorem, we have ρ ˆ h m ( Y ) → ρ ˆ h ( Y ) as m → ∞ . Therefore, we havesup F X ∈M ρ ˆ h ( X ) sup F X ∈M ρ h ( X ) . The proof is complete. 34 .4 Proof of Theorem 2 and related lemmas

In the following, we write q as the H¨older conjugate of p . The following lemma closely resemblesTheorem 3.4 of Liu et al. (2020) with only an additional statement on the uniqueness of the quantilefunction of the maximizer. Lemma A.1.

For h ∈ H ∗ , m ∈ R , v > and p > , we have sup F Y ∈M p ( m,v ) ρ h ( Y ) = mh (1) + v [ h ] q , If < [ h ] q < ∞ , the above supremum is attained by a random variable X such that F X ∈ M p ( m, v ) with its quantile function uniquely determined by VaR t ( X ) = m + vφ qh ( t ) , t ∈ (0 , a.e. (A.5) If [ h ] q = 0 , the above maximum value is attained by any random variable X such that F X ∈M p ( m, v ) .Proof. The only statement that is more than Theorem 3.4 of Liu et al. (2020) is the uniqueness ofthe quantile function (A.5). Without loss of generality, assume m = 0 and v = 1. Using the H¨olderinequalitysup F Y ∈M p (0 , Z h ′ ( t )VaR − t ( Y ) d t = sup F Y ∈M p (0 , Z ( h ′ ( t ) − c h,q )VaR − t ( Y ) d t sup F Y ∈M p (0 , || h ′ − c h,q || q (cid:18)Z | VaR − t ( Y ) | p d t (cid:19) /p = [ h ] q . The maximum is attained by F X only if the above inequality is an equality, which is equivalent tothat the function t

7→ |

1) a.e.Hence, the quantile function of X is uniquely determined by (A.5). Lemma A.2.

For all h ∈ H with h = ˆ h , m ∈ R , v > and p > , if [ h ∗ ] q < ∞ , we have sup F Y ∈M p ( m,v ) ρ h ( Y ) = sup F Y ∈M p ( m,v ) ρ h ∗ ( Y ) = mh (1) + v [ h ∗ ] q , and the above supremums are simultaneously attained by a random variable X such that F X ∈M p ( m, v ) with VaR t ( X ) = m + vφ qh ∗ ( t ) , t ∈ (0 , a.e. (A.6) Proof.

The statement directly follows from Theorem 1 and Lemma A.1.35 roof of Theorem 2.

Together with Theorem 1, Lemmas A.1 and A.2 give the statement in Theorem2 on the supremum. Arguments for the inﬁmum are symmetric. For instance, noting that ( − h ) ∗ = − h ∗ , Theorem 1 yieldsinf F Y ∈M p ( m,v ) ρ h ( Y ) = − sup F Y ∈M p ( m,v ) ρ − h ( Y )= − sup F Y ∈M p ( m,v ) ρ ( − h ) ∗ ( Y )= − sup F Y ∈M p ( m,v ) ρ − h ∗ ( Y ) = inf F Y ∈M p ( m,v ) ρ h ∗ ( Y ) . We omit the detailed arguments for the inﬁmum in Theorem 2.

A.5 Proof of Corollary 1

Proof of Corollary 1.

We prove the ﬁrst half (the suprema). The second half is symmetric to theﬁrst half. Theorem 2 and Lemma A.2 givesup F Y ∈M p ( m,v ) VaR α ( Y ) = sup F Y ∈M p ( m,v ) ES α ( Y ) = m + v [ h ∗ ] q . By Lemma A.1, the corresponding random variable Z which attains ES α ( Z ) = m + v [ h ∗ ] q hasleft-quantile function F − Z ( t ) = m + vφ qh ∗ ( t ) = m + v (cid:12)(cid:12)(cid:12) − α ( α, ( t ) − (cid:12)(cid:12)(cid:12) q − α ( α, ( t ) − h ∗ ] − qq , t ∈ [0 ,

1] a.e.Note that φ qh ∗ ( t ) only takes two values for t > α and t < α , respectively. Thus Z is a bi-atomicrandom variable, and using E [ Z ] = m , we have, for some k p > P ( Z = m + αk p ) = 1 − α and P ( Z = m − (1 − α ) k p ) = α. We note that the number k p can be determined from E [ | Z − m | p ] = v p , that is, k p = v ( α p (1 − α ) + (1 − α ) p α ) − /p , leading to sup F Y ∈M p ( m,v ) VaR α ( Y ) = sup F Y ∈M p ( m,v ) ES α ( Y ) = m + vα ( α p (1 − α ) + (1 − α ) p α ) − /p ,,