[PDF] Convex Risk Measures based on Divergence

Abstract

Risk measures connect probability theory or statistics to optimization, particularly to convex optimization. They are nowadays standard in applications of finance and in insurance involving risk aversion. This paper investigates a wide class of risk measures on Orlicz spaces. The characterizing function describes the decision maker's risk assessment towards increasing losses. We link the risk measures to a crucial formula developed by Rockafellar for the Average Value-at-Risk based on convex duality, which is fundamental in corresponding optimization problems. We characterize the dual and provide complementary representations.

Full PDF

aa r X i v : . [ q -f i n . R M ] M a r Convex Risk Measures based on Divergence

Paul Dommel ∗ Alois Pichler ∗ †

March 26, 2020

Abstract

Risk measures connect probability theory or statistics to optimization, particularly toconvex optimization. They are nowadays standard in applications of ﬁnance and in insuranceinvolving risk aversion.This paper investigates a wide class of risk measures on Orlicz spaces. The characterizingfunction describes the decision maker’s risk assessment towards increasing losses. We linkthe risk measures to a crucial formula developed by Rockafellar for the Average Value-at-Risk based on convex duality, which is fundamental in corresponding optimization problems.We characterize the dual and provide complementary representations.

Keywords : risk measures, Orlicz spaces, Duality

MSC classiﬁcation : 91G70, 94A17, 46E30, 49N1

Risk measures are of fundamental importance in assessing risk, they have numerous applicationsin ﬁnance and in actuarial mathematics. A cornerstone is the Average Value-at-Risk, whichhas been considered in insurance ﬁrst. Rockafellar and Uryasev (2002, 2000) develop its dualrepresentation, which is an important tool when employing risk measures for concrete optimiza-tion. Even more, the Average Value-at-Risk is the major building block in what is known as theKusuoka representation. The duality relations are also elaborated in Ogryczak and Ruszczyński(2002, 1999).Risk measures are most typically considered on Lebesgue spaces as L or L ∞ , although theseare not the most general Banach space to consider them. An important reason for choosing thisdomain is that risk measures are Lipschitz continuous on L ∞ .A wide class of risk measures can be properly deﬁned on function spaces as Orlicz spaces.These risk functionals get some attention in Bellini et al. (2014), while Bellini and Rosazza Gianin(2012); Cheridito and Li (2009, 2008) elaborate their general properties. Delbaen and Owari(2019) investigate risk aversion on Orlicz spaces as well, but they consider a somewhat widerclass of risk functionals, which is not necessarily law invariant. ∗ University of Technology, Chemnitz, Germany. Funded by Deutsche Forschungsgemeinschaft (DFG, GermanResearch Foundation) – Project-ID 416228727 – SFB 1410. † Corresponding author: [email protected] risk quadrangles has become essential inunderstanding risk as well (cf. Rockafellar and Royset (2016)).

Outline of the paper.

The following section recalls essentials from generalized divergenceand introduces the notation. Section 3 introduces the ϕ -divergence risk measure and Section 4discusses its natural domain and the associated norm. In Section 5 we derive important represen-tations, including the dual representation and the Kusuoka representation. We ﬁnally characterizethe dual norm and exploit the convincing properties of the risk measure for concrete optimizationproblems. Section 7 concludes the paper with a closing discussion. In what follows we repeat the deﬁnition of risk measures and divergence. The ﬁrst subsectionstates the deﬁnition and interpretation of risk measures. We further provide some interpretationswhich cause their outstanding importance in economics.

A risk measure is a function ρ mapping random variables from some space L to the reals, ρ : L → R ∪ {∞} . The inherent interpretation is that the random variable X with randomoutcomes is associated with the risk ρ ( X ) . In insurance, the number ρ ( X ) is understood aspremium for the insurance policy X .Axioms for risk measures have been introduced by Artzner et al. (1997, 1999). A riskmeasure is called coherent if it satisﬁes the following axioms (cf. also Rockafellar and Royset(2014)):A1. Monotonicity: ρ ( X ) ≤ ρ ( X ) provided that X ≤ X almost surely.A2. Translation equivariance: ρ ( X + c ) = ρ ( X ) + c for any X ∈ L and c ∈ R .A3. Subadditivity: ρ ( X + X ) ≤ ρ ( X ) + ρ ( X ) for all X , X ∈ L .24. Positive homogeneity: ρ ( λ X ) = λ ρ ( X ) for all X ∈ L and λ > coherent speciﬁcally refers to the Axiom A4.The domain L of the risk functional is often not speciﬁed. In what follows we introduce ϕ -divergence and elaborate the natural domain, which is as large as possible, of the associatedrisk measures. Divergence is a concept originating from statistics. The divergence quantiﬁes, how much aprobability measure deviates from an other measure. We deﬁne divergence functions ﬁrst tointroduce the general ϕ -divergence. Deﬁnition 2.1 (Divergence function) . A convex and lsc. function ϕ : R → R ∪{∞} is a divergencefunction if ϕ ( ) =

0, dom ( ϕ ) = [ , ∞) andlim x →∞ ϕ ( x ) x = ∞ . (1) Remark ϕ -divergence) . The term divergence function is inspired by ϕ - divergence . For adivergence function ϕ , the ϕ - divergence of a probability measure Q from P is given by D ϕ ( Q k P ) : = ∫ Ω ϕ (cid:18) dQdP (cid:19) dP if Q ≪ P and ∞ otherwise. This divergence is an important concept of a non-symmetric distancebetween probability measures. Kullback–Leibler is the divergence obtained for ϕ ( x ) = x log x .For a detailed discussion of the general ϕ -divergence we refer to Breuer and Csiszár (2013a,b).In what follows we assume that ϕ is a divergence function satisfying all conditions ofDeﬁnition 2.1. Associated with ϕ is its convex conjugate ψ deﬁned by ψ ( y ) : = sup z ∈ R y z − ϕ ( z ) .These two functions satisfy the Fenchel–Young inequality x y ≤ ϕ ( x ) + ψ ( y ) , x , y in R , (2)and further properties, as stated in the following proposition. Proposition 2.3.

Let ϕ be divergence function and ψ its convex conjugate. The followingstatements hold true:(i) ϕ and ψ are continuous on ( , ∞) and (−∞ , ∞) , respectively.(ii) ψ is non-drecasing.(iii) It holds that y ≤ ψ ( y ) for every y ∈ R .Proof. For the ﬁrst assertion we recall Rockafellar (1970, Theorem 10.4), which states thata convex function is continuous on the interior of its domain. Therefore continuity of ϕ is3mmediate. For continuity of ψ it is suﬃcient to demonstrate that ψ ( y ) < ∞ holds for every y ∈ R . By contraposition we assume there is a point y ∈ R such that ∞ = ψ ( y ) = sup z ∈ R y z − ϕ ( z ) = sup z ∈ dom ( ϕ ) y z − ϕ ( z ) = sup z ≥ y z − ϕ ( z ) . The function ϕ is ﬁnite in its domain and thus the supremum can not be attained at some point z ∗ ≥

0. We thus have ∞ = ψ ( y ) = lim z →∞ y z − ϕ ( z ) = lim z →∞ z (cid:18) y − ϕ ( z ) z (cid:19) and consequently lim z →∞ (cid:16) y − ϕ ( z ) z (cid:17) ≥

0. This contradicts assumption (1), i.e., ϕ ( z ) z tends to ∞ for z → ∞ .The second assertion (ii) follows from ψ ( y ) = sup z ∈ R y z − ϕ ( z ) = sup z ≥ y z − ϕ ( z ) ≤ sup z ≥ y z − ϕ ( z ) = ψ ( y ) for y ≤ y . We ﬁnally have that ψ ( y ) = sup z ∈ R y z − ϕ ( z ) ≥ y · − ϕ ( ) = y , y ∈ R , which completes the proof. (cid:3) ϕ -divergence risk measures Ahmadi-Javid (2012a,b) introduces the Entropic Value-at-Risk based on Kullback–Leibler diver-gence and brieﬂy mentions a possible generalization. We pick up and advance this idea anddemonstrate that ϕ -divergence risk measures are indeed coherent risk measures as speciﬁed bythe Axioms A1–A4 above.In what follows we deduce further properties of these risk measures, which are of importancein subsequent investigations. Deﬁnition 3.1 ( ϕ -divergence risk measure) . Let ϕ be a divergence function with convex conju-gate ψ . The ϕ -divergence risk measure ρ ϕ,β : L → R ∪ {∞} is ρ ϕ,β ( X ) : = inf µ ∈ R , t > t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) , (3)where the coeﬃcient β > Remark . The divergence function ϕ characterizes the shapeof risk aversion for increasing risk, while the risk aversion coeﬃcient β describes the tendencyof an investor to avoid risk. 4he risk measure in (3) above is well deﬁned for X ∈ L , as E X ≤ ρ ϕ,β ( X ) (4)by Proposition 2.3 (iii). Note, however, that the risk measure may be unbounded, i.e., ρ ϕ,β ( X ) = ∞ . Further observe that ρ ϕ,β only depends on the expectation and is therefore law invariant,i.e., the risk measure evaluates random variables X and X ′ equally, provided that P ( X ≤ x ) = P ( X ′ ≤ x ) for all x ∈ R .The following proposition demonstrates that ρ ϕ,β is indeed a coherent risk measure. Proposition 3.3.

The functional ρ ϕ,β is a coherent risk measure, it satisﬁes all Axioms A1–A4above.Proof. To demonstrate translation equivariance let c ∈ R be given. Employing the substitution˜ µ : = µ − ct we have that ρ ϕ,β ( X + c ) = inf µ ∈ R t > t (cid:18) β + µ + E ψ (cid:18) X + ct − µ (cid:19) (cid:19) = inf µ ∈ R t > t (cid:18) β + ˜ µ + ct + E ψ (cid:18) Xt − ˜ µ (cid:19) (cid:19) = ρ ϕ,β ( X ) + c , which is translation equivariance, A2. As for positive homogeneity observe that ρ ϕ,β ( λ X ) = inf µ ∈ R t > t (cid:18) β + µ + E ψ (cid:18) λ Xt − µ (cid:19) (cid:19) = inf µ ∈ R t > λ ˜ t (cid:18) β + µ + E ψ (cid:18) λ X λ ˜ t − µ (cid:19) (cid:19) = λ ρ ϕ,β ( X ) , where we have substituted ˜ t : = t λ .Monotonicity follows directly from monotonicity of ψ (Proposition 2.3 (ii)). Indeed, providedthat X ≤ X we have that E ψ (cid:18) X t − µ (cid:19) ≤ E ψ (cid:18) X t − µ (cid:19) , which implies ρ ϕ,β ( X ) ≤ ρ ϕ,β ( X ) .As for subadditivity let X , Y ∈ L be given. It holds that ρ ϕ,β ( X ) + ρ ϕ,β ( Y ) = inf µ ∈ R t > t (cid:18) β + µ + E (cid:18) ψ (cid:18) Xt − µ (cid:19) (cid:19) (cid:19) + inf µ ∈ R t > t (cid:18) β + µ + E (cid:18) ψ (cid:18) Yt − µ (cid:19) (cid:19) (cid:19) ≥ inf µ ,µ ∈ R t , t > ( t + t ) (cid:18) β + t µ + t µ t + t + E (cid:18) t t + t ψ (cid:18) Xt − µ (cid:19) + t t + t ψ (cid:18) Yt − µ (cid:19) (cid:19) (cid:19) . t t + t and t t + t gives ρ ϕ,β ( X ) + ρ ϕ,β ( Y )≥ inf µ ,µ ∈ R t , t > ( t + t ) (cid:18) β + t µ + t µ t + t + E (cid:18) ψ (cid:18) X + Yt + t − t µ + t µ t + t (cid:19) (cid:19) (cid:19) = ρ ϕ,β ( X + Y ) , as t + t > t µ + t µ t + t ∈ R . This proves A3 (subadditivity). (cid:3) Remark . This proof of coherence of ρ ϕ,β does not involve all conditions imposed on ϕ above.However, the particular condition (1) turns out to be of importance for the proper domain ofthese risk measures, as Section 4 outlines below. Remark . The general inequality0 ≤ ρ ϕ,β ( ) = inf µ ∈ R t > t (cid:0) µ + ψ (− µ + β ) (cid:1) ≤ X = t →

0. The generalbounds E X ≤ ρ ϕ,β ( X ) ≤ ess sup ( X ) . (5)follow from translation equivariance.The following proposition exposes the parameter of risk aversion β . We demonstrate that alarger parameter of risk aversion increases the risk assessment for every random variable. Proposition 3.6.

Suppose that < β ≤ β . It holds that ρ ϕ,β ( X ) ≤ ρ ϕ,β ( X ) for every X ∈ L . Conversely, for any non-negative random variable X ≥ we have that ρ ϕ,β ( X ) ≤ β β ρ ϕ,β ( X ) . Proof.

It is immediate that t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) ≤ t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) , t > , µ ∈ R , and hence the ﬁrst assertion.As for the second inequality assume that X is non-negative. The inequality ρ ϕ,β ( X ) = t ∗ (cid:18) β + µ ∗ + E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) ≥ t ∗ β + E X ≥ t ∗ β t ∗ denotes the optimal value inside of (3) (and 0, if the inﬁmum isnot attained). In other words, the set of possible optimal values of t is bounded by ρ ϕ,β ( X ) β .Consequently we have β β ρ ϕ,β ( X ) = β − β β ρ ϕ,β ( X ) + inf µ ∈ R < t ≤ ρϕ,β ( X ) β t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) ≥ inf µ ∈ R < t ≤ ρϕ,β ( X ) β t ( β − β ) + t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) ≥ ρ ϕ,β ( X ) , the assertion. (cid:3) This section demonstrates that the largest vector space on which ϕ -divergence risk measures areﬁnite, are speciﬁc Orlicz spaces. We further show that ϕ -divergence norms, which are based on ϕ -divergence risk measures, are equivalent to certain Orlicz norms on these spaces. Coherent risk measures induce semi-norms, cf. Pichler (2013, 2017); Kalmes and Pichler (2018).Following this setting we introduce ϕ -divergence norms by k X k ϕ,β : = ρ ϕ,β (| X |) . (6)This is indeed a norm, as k X k ϕ,β = X =

0, as follows from (5).It is a consequence of A1–A4 and the vector space axioms that k · k ϕ,β is ﬁnite, iﬀ ρ ϕ,β ( · ) is ﬁnite. We therefore consider the risk measure on the set (cid:8) X ∈ L : k X k ϕ,β < ∞ (cid:9) . (7) Remark . By Proposition 3.6 it follows for β < β that k X k ϕ,β ≤ k X k ϕ,β ≤ β β k X k ϕ,β . (8)The norms associated with risk functionals are thus equivalent for varying risk aversion parame-ters β > In what follows we discuss the spaces (7) endowed with norm (6). To this end we introduce theOrlicz class with their associated norms ﬁrst. 7 eﬁnition 4.2 (Orlicz norms and spaces) . A convex function Φ : [ , ∞) → [ , ∞) with Φ ( ) = x → Φ ( x ) x = x →∞ Φ ( x ) x = ∞ and its convex conjugate Ψ are called a pair of complementary Young-functions . Given a pair ofcomplementary Young-functions Φ and Ψ , the norms k X k Φ : = sup E Ψ (| Z |)≤ E X Z and (9) k X k ( Φ ) : = inf (cid:26) λ > E Φ (cid:18) | X | λ (cid:19) ≤ (cid:27) (10)are called Orlicz norm and

Luxemburg norm , respectively. Further, the spaces M Φ : = (cid:8) X ∈ L : E Φ ( t | X |) < ∞ for all t > (cid:9) and (11) L Φ : = (cid:8) X ∈ L : E Φ ( t | X |) < ∞ for some t > (cid:9) (12)are called Orlicz heart and

Orlicz opace , respectively.

Remark . The Orlicz norm k · k Φ and the Luxemburg norm k · k ( Φ ) are topologically equivalent.More speciﬁcally, it holds that k X k ( Φ ) ≤ k X k Φ ≤ k X k ( Φ ) on L Φ (see Pick et al. (2013, Theorem 4.8.5)).The next Lemma relates divergence functions and Young functions. Lemma 4.4.

Let ϕ be a divergence function (cf. Deﬁnition 2.1). The function Φ ( x ) : = ( if x ∈ [ , ] max { , ϕ ( x )} else (13) is an Young-function (cf. Deﬁnition 4.2) and a divergence function (Deﬁnition 2.1). Further, forevery X ∈ L , it holds that k X k ϕ,β < ∞ if and only if k X k Φ ,β < ∞ and ββ + d k X k ϕ,β ≤ k X k Φ ,β ≤ β + d β k X k ϕ,β , where d : = k ϕ − Φ k L ∞ = sup x ≥ | ϕ ( x ) − Φ ( x )| .Proof. For the ﬁrst assertion it is suﬃcient to show that Φ is convex, as the other properties areevident by the deﬁnition of ϕ and Φ . Let 0 ≤ x ≤ y and λ ∈ ( , ) be given. As max { , ϕ } isstill convex, we may assume x ∈ [ , ] and y >

1. By employing ϕ ( ) =

0, max { , ϕ ( x )} ≥ { , ϕ } , it follows that Φ is non-decreasing on [ , ∞) and thus on [ , ∞) .We therefore have Φ ( λ x + ( − λ ) y ) ≤ Φ ( λ + ( − λ ) y ) ≤ λ Φ ( ) + ( − λ ) Φ ( y ) = λ Φ ( x ) + ( − λ ) Φ ( y ) d < ∞ by (1) and convexity of ϕ . Employing the obviousinequality ϕ ( x ) − d ≤ Φ ( x ) we get that Ψ ( y ) = sup z ∈ R y z − Φ ( z ) ≤ sup z ∈ R y z − ( ϕ ( z ) − d ) ≤ sup z ∈ R y z − ϕ ( z ) + d = ψ ( y ) + d for all y ∈ R . Inserting this into (3) it follows k X k Φ ,β < ∞ if k X k ϕ,β < ∞ and k X k Φ ,β = inf µ ∈ R , t > t (cid:18) β + µ + E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) ≤ inf µ ∈ R , t > t (cid:18) β + d + µ + E ψ (cid:18) | X | t − µ (cid:19) (cid:19) = k X k ϕ,β + d ≤ β + d β k X k ϕ,β by (8). The proof of the converse statement is analogous. (cid:3) The following two theorems, which are the main results of this section, establish that thedomains of divergence risk measures are speciﬁc Orlicz spaces.

Theorem 4.5 (Equivalence of norms) . Let ϕ be a divergence function and the associated Young-function Φ be given from (13) . It holds that k X k ϕ,β < ∞ and k X k Φ ,β < ∞ if and only if X ∈ L Ψ .Furthermore, the norms k · k ϕ,β , k · k Φ ,β and k · k Φ are equivalent on L Ψ . In particular we have the inequality { , β } k X k Φ ,β ≤ k X k Φ ≤ Ψ ( ) + { , β } k X k Φ ,β (14) for all X ∈ L Ψ .Proof. Let be X ∈ L Ψ . By employing (8) with β = { , β } k X k Φ ,β ≤ k X k Φ , ≤ { , β } k X k Φ ,β and it is thus suﬃcient to show (14) for β =

1. We have that k X k Φ , = inf µ ∈ R , t > t (cid:18) + µ + E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) ≤ inf t > t (cid:18) + E Ψ (cid:18) | X | t (cid:19) (cid:19) , where the last term is an equivalent expression of the Orlicz norm in (9) (see Krasnosel’skii andRutickii (1961, Theorem 10.5)). Therefore, the inequality1max { , β } k X k Φ ,β ≤ inf t > t (cid:18) + E Ψ (cid:18) | X | t (cid:19) (cid:19) = k X k Φ < ∞ holds true. 9o prove the converse inequality assume k X k Φ , < ∞ . By the deﬁnition of Ψ and Proposi-tion 2.3 (iii) we have that Ψ ( ) = − inf z ∈ R Φ ( z ) = − y + Ψ ( y ) ≥ y ∈ R . Therefore,as − y + Ψ ( y ) is a non-negative, convex function which is 0 in the origin, it is non-decreasing on [ , ∞) . Hence the inﬁmum in (3) is not attained for µ < k X k Φ , = inf µ ∈ R , t > t (cid:18) + µ + E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) = inf µ ≥ , t > t (cid:18) + µ + E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) . Moreover, as t and Ψ are non-negative, we get from 1 + Ψ ( ) ≥ µ ≥ , t > t (cid:18) + µ + E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) ≥ + Ψ ( ) inf µ ≥ , t > t + ( µ t ) ( + Ψ ( )) + t E Ψ (cid:18) | X | t − µ (cid:19) = inf µ ≥ , t > t + µ t Ψ ( ) + (cid:18) + µ tt + µ t Ψ ( ) + tt + µ t E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) and therefore, by applying Jensen’s inequality, ∞ > k X k Φ , ≥ inf µ ≥ , t > t + µ t Ψ ( ) + (cid:18) + µ tt + µ t Ψ ( ) + tt + µ t E Ψ (cid:18) | X | t − µ (cid:19) (cid:19) ≥ inf µ ≥ , t > t + µ t Ψ ( ) + (cid:18) + E Ψ (cid:18) | X | t + µ t (cid:19) (cid:19) ≥ Ψ ( ) + k X k Φ . This establishes k X k Φ ,β ⇐⇒ X ∈ L Ψ as well as (14). The remaining statement is immediate byLemma 4.4. This yields the claim. (cid:3) Theorem 4.6 (Equivalence of spaces) . Let ψ be the convex conjugate of a divergence function ϕ . It holds that k X k ϕ,β < ∞ if and only if X ∈ L ψ and ( M ψ , k · k ϕ,β ) (cid:27) ( M Ψ , k · k Φ ) as well as ( L ψ , k · k Φ ,β ) (cid:27) ( L Ψ , k · k Φ ) (here, (cid:27) indicates a continuous isomorphism).Proof. We have ψ ( y ) − d < Ψ ( y ) < ψ ( y ) + d as shown in the proof of Lemma 4.4 andhence the setwise identities M Ψ = M ψ and L Ψ = L ψ . The remaining assertion follows fromTheorem 4.5. (cid:3) To emphasize the strength of the previous result we provide some propositions which areconsequences of Theorem 4.6 and general results on Orlicz space theory.

Proposition 4.7.

The pairs ( M ψ , k · k ϕ,β ) and ( L ψ , k · k ϕ,β ) are Banach spaces. Proposition 4.8.

The simple functions are dense in ( M ψ , k · k ϕ,β ) .Proof. Cf. Pick et al. (2013, Theorem 4.9.1, Theorem 4.12.8). (cid:3)

Proposition 4.9.

The following duality relations hold true: The sets M ψ ( L ψ , resp.) are deﬁned as in (11) (in (12), resp.). i) ( M ψ , k · k ϕ,β ) ∗ (cid:27) ( L ϕ , k · k ∗ ϕ,β ) , where ∗ indicates the dual space (the dual norm, resp.).(ii) Assume ϕ satisﬁes the ∆ -condition, i.e., there exist numbers T , k ≥ such that ϕ ( x ) ≤ k ϕ ( x ) for all T < x . (15) Then ( M ϕ , k · k ϕ,β ) = ( L ϕ , k · k ϕ,β ) and ( L ψ , k · k ϕ,β ) (cid:27) ( M ψ , k · k ϕ,β ) ∗∗ .(iii) ( M ψ , k · k ϕ,β ) is reﬂexive if and only if ϕ and ψ satisfy the ∆ -condition.Proof. Pick et al. (2013, Theorem 4.13.6, Remark 4.13.8 and Theorem 4.13.9). (cid:3)

This section establishes the dual representation of ϕ -divergence risk measures. We further deducea simple criterion to ensure that the inﬁmum in (3) is attained. The Kusuoka’s representationrelates the ϕ -divergence risk measures with distortion risk measures, which are of practicalimportance. The subsequent theorem provides the exact shape of the dual representation of the ϕ -divergencerisk measure. Ahmadi-Javid (2012a) gives a similar result for L ∞ , but this space is not dense in L ψ as Ahmadi-Javid and Pichler (2017, Theorem 3.2) elaborate for the Entropic Value-at-Risk. Theorem 5.1 (Dual representation) . For every X ∈ L ψ , the ϕ -divergence risk measure has therepresentation ρ ϕ,β ( X ) = sup Z ∈ M ϕ,β E X Z , (16) where M ϕ,β : = (cid:8) Z ∈ L : Z ≥ , E Z = , E ϕ ( Z ) ≤ β (cid:9) . (17)In order to prove the dual representation we need to recall a result on so-called normalconvex integrands. A function g : Ω × R → (−∞ , ∞] is said to be a normal convex integrand ,if (i) ω g ( ω, x ) is measurable for every ﬁxed x and (ii) if x g ( ω, x ) is convex, lowersemicontinuous and int dom ( g ( ω, ·)) = ∅ for almost all ω ∈ Ω . The following theorem isa special case of Rockafellar (1976, p. 185, Theorem 3A). It states that the supremum andexpectation can be interchanged for normal convex integrands, if certain conditions are satisﬁed(the space L is notably decomposable). Theorem 5.2 (Interchangeability principle) . Let ( Ω , F , P ) be a probability space and g : Ω × R → R ∪ {∞} a normal convex integrand. Then sup X ∈ L ( Ω , F , P ) ∫ Ω g ( ω, X ( ω )) P ( d ω ) = ∫ Ω sup x ∈ R g ( ω, x ) P ( d ω ) holds if the left supremum is ﬁnite.

11e now establish the dual representation (16) of the divergence risk measure.

Proof of Theorem 5.1.

Let X ∈ L ψ and Z ∈ M ϕ,β be given. By applying the Fenchel–Younginequality (2) inside of the objective function in (3) we get for Z ∈ M ϕ,β that t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) ≥ t (cid:18) β + µ + E (cid:18) Xt − µ (cid:19) Z − ϕ ( Z ) (cid:19) ≥ t µ + t β − t µ E Z − t E ϕ ( Z ) + E X Z ≥ E X Z , provided that t > µ ∈ R . Taking the inﬁmum among all t > µ ∈ R on the left hand sideand the supremum for all Z ∈ M ϕ,β on the right hand side it follows that ∞ > ρ ϕ,β ( X ) ≥ sup Z ∈ M ϕ,β E X Z . (18)This is the ﬁrst inequality required (16).As for the converse observe that the constant random variable Z ≡ E ϕ ( Z ) = ϕ ( ) = < β . This is, as stated in Luenberger (1969, p. 236 Problem 7), a suﬃcientcondition for strong duality for the right problem in (16), i.e., there exist Lagrange multipliers µ ∗ ∈ R and t ∗ ≥ Z ∈ M ϕ,β E X Z = sup Z ≥ E X Z − µ ∗ ( E Z − ) − t ∗ ( E ϕ ( Z ) − β ) . (19)Further, by employing inf x ≥ ϕ ( x ) > −∞ and substituting t ¯ µ = µ we have that sup Z ≥ E X Z − µ ∗ ( E Z − ) − t ∗ ( E ϕ ( Z ) − β ) ≥ inf µ ∈ R , t > sup Z ≥ E X Z − µ ( E Z − ) − t inf x ≥ ϕ ( x ) + t β ≥ inf µ ∈ R , t > sup Z ≥ E X Z − µ ( E Z − ) − t ( E ϕ ( Z ) − β ) = inf ¯ µ ∈ R , t > t (cid:18) ¯ µ + β + sup Z ≥ E (cid:18) (cid:18) Xt − ¯ µ (cid:19) Z − ϕ ( Z ) (cid:19)(cid:19) = inf ¯ µ ∈ R , t > t (cid:18) ¯ µ + β + sup Z ∈ L E (cid:18) (cid:18) Xt − ¯ µ (cid:19) Z − ϕ ( Z ) (cid:19)(cid:19) , where the last equality follows from the condition ϕ ( z ) = ∞ for z < . Now observe thatthe inner function f ( ω, z ) : = (cid:16) X ( ω ) t − ¯ µ (cid:17) z − ϕ ( z ) is a normal convex integrand, as ϕ is lowersemicontinuous and int dom ( ϕ ) = ( , ∞) , ∅ . Moreover, as X ∈ L ψ , it follows from (2) that sup Z ∈ L E (cid:18) (cid:18) Xt − µ (cid:19) Z − ϕ ( Z ) (cid:19) ≤ E ψ (cid:18) Xt − µ (cid:19) < ∞ for some µ ∈ R and t > . Therefore, by inserting Theorem 5.1, we have that sup M ϕ,β E X Z ≥ inf µ ∈ R , t > t (cid:18) µ + β + sup Z ∈ L E (cid:18) Xt − µ (cid:19) Z − ϕ ( Z ) (cid:19) = inf µ ∈ R , t > t (cid:18) µ + β + E (cid:18) sup z ∈ R (cid:18) Xt − µ (cid:19) z − ϕ ( z ) (cid:19) (cid:19) = inf µ ∈ R , t > t (cid:18) µ + β + E ψ (cid:18) Xt − µ (cid:19) (cid:19) , (cid:3) The ϕ -divergence risk measures derive its name from their relation to ϕ divergence. We providethis relation now explicitly and investigate the dual representation. We further relate the dualrepresentation (16) to Haezendonck risk measures. Remark . Let M ϕ,β as in (17) and Z ∈ M ϕ,β . The randomvariable Z satisﬁes Z ≥ and E Z = . Therefore Q Z deﬁned as Q Z ( B ) : = E P B Z is a probability measure. Q Z is absolutely continuous with respect to P and Radon–Nikodymderivative dQ Z dP = Z . Hence we can reformulate the dual representation (16) as ρ ϕ,β ( X ) = sup Q ≪ P (cid:8) E Q X : D ϕ ( Q k P ) ≤ β (cid:9) , (20)where D ϕ ( Q k P ) is the ϕ -divergence deﬁned in Remark 2.2. ρ ϕ,β ( X ) can therefore be interpretedas the largest expected value E Q X over all probability measures Q within a ϕ -divergence ballaround P . The divergence function ϕ characterizes the shape of the ball, while β determines theradius. Remark . Suppose ϕ is a Young-function asin Deﬁnition 4.2. Then the dual representation in (16) rewrites as ρ ϕ,β ( X ) = (cid:8) E X Z : Z ≥ , E Z = , k Z k ( ˜ ϕ ) ≤ (cid:9) where ˜ ϕ is the function ˜ ϕ (·) = β ϕ (·) and k · k ( ˜ ϕ ) the corresponding Luxemburg norm (10). Thedual norm of k · k ( ˜ ϕ ) is the Orlicz norm k · k ˜ ψ , cf. (9), where ˜ ψ is the associated convex conjugate.Interchanging k · k ( ˜ ϕ ) by k · k ˜ ψ we get ρ ( X ) = (cid:8) E X Z : Z ≥ , E Z = , k Z k ˜ ψ ≤ (cid:9) , which is the dual representation of the so-called Haezendonck–Goovaerts risk measure (seeBellini and Rosazza Gianin (2012, Proposition 4)). It therefore turns out that the Haezendonck–Goovaerts risk measures are the natural dual counterparts of the ϕ -divergence risk measures, asthe corresponding feasible sets are determined by norms which are dual to each other. For moreinformation on Haezendonck–Goovaerts risk measures see Bellini and Rosazza Gianin (2008a),Bellini and Rosazza Gianin (2012) and Goovaerts et al. (2012).Employing the dual representation we derive a simple condition when the inﬁmum in (3) isattained. Proposition 5.5 (Existence of minimizers) . Let X ∈ L ψ and ¯ α be given by ¯ α : = max (cid:26) α ∈ [ , ) : ϕ ( ) α + ϕ (cid:18) − α (cid:19) ( − α ) ≤ β (cid:27) . (21)13 f P ( X = ess sup ( X )) < − ¯ α (22) holds true, then the inﬁmum in the deﬁning equation of the risk measure (3) is attained.Proof. The assertion is shown in two parts. The ﬁrst part demostrates ρ ϕ,β ( X ) < ess sup ( X ) while the second establishes that ρ ϕ,β ( X ) = ess sup ( X ) holds if the inﬁmum is not attained. Theassertion then follows by contradiction.To prove the ﬁrst part let M ϕ,β as in (17), ¯ α as in (21) and X ∈ L ψ as in (22) be given.We choose Z ∈ M ϕ,β , α ∈ ( ¯ α, − P ( X = ess sup ( X ))) and U uniform distributed on [ , ] . Wefurther set µ Z α : = E (cid:0) F − Z ( U ) (cid:12)(cid:12) ≤ U < α (cid:1) and µ Z − α : = E (cid:0) F − Z ( U ) (cid:12)(cid:12) α ≤ U ≤ (cid:1) . As F − Z ( U ) and Z are identically distributed it follows that = E ( F − Z ( U )) = µ Z α P ( ≤ U < α ) + µ Z − α P ( α ≤ U ≤ ) = µ Z α α + µ Z − α ( − α ) and β ≥ E (cid:16) ϕ ( F − Z ( U )) (cid:17) = E (cid:16) ϕ ( F − Z ( U )) (cid:12)(cid:12) ≤ U < α (cid:17) α + E (cid:16) ϕ ( F − Z ( U )) (cid:12)(cid:12) α ≤ U ≤ (cid:17) ( − α )≥ ϕ (cid:16) µ Z α (cid:17) α + ϕ (cid:16) µ Z − α (cid:17) ( − α ) = ϕ (cid:16) µ Z α (cid:17) α + ϕ (cid:18) − α µ Z α − α (cid:19) ( − α ) where we employed Jensen’s inequality to obtain the second inequality. Additionally, by thedeﬁnition of ¯ α in (21), we have that ϕ ( ) α + ϕ (cid:18) − α (cid:19) ( − α ) > ϕ ( ) ¯ α + ϕ (cid:18) − ¯ α (cid:19) ( − ¯ α ) = β. From this and the continuity of ϕ we conclude that there exists a positive constant c , not dependingon Z , such that µ Z α ≥ c holds for every Z ∈ M ϕ,β . Hence, by employing the covariance inequalityin Wang and Dhaene (1998, Theorem 4), it follows that E X Z ≤ ∫ F − X ( u ) F − Z ( u ) du = ∫ α F − X ( u ) F − Z ( u ) du + ∫ α F − X ( u ) F − Z ( u ) du ≤ F − X ( α ) (cid:18)∫ α F − Z ( u ) du (cid:19) + F − X ( ) (cid:18)∫ α F Z ( u ) du (cid:19) ≤ F − X ( α ) α c + F − X ( ) ( − α c ) and consequently ρ ϕ,β ( X ) = sup Z ∈ M ϕ,β E X Z ≤ F − X ( α ) α c + F − X ( ) ( − α c ) < F − X ( ) = ess sup ( X ) , which demonstrates the ﬁrst part.For the second note that the inﬁmum in (3) is not attained if and only if t inside of inf µ ∈ R t > t (cid:18) β + µ + E ψ (cid:18) Xt − µ (cid:19) (cid:19) tends towards . Hence we have t ∗ = for the Lagrange multiplier t ∗ in (19). It thus follows that ρ ϕ,β ( X ) = sup M ϕ,β E X Z = sup Z ≥ E X Z − µ ∗ ( E Z − ) = ess sup ( X ) . This completes the proof. (cid:3) .3 Spectral representation The ϕ -divergence risk measure ρ ϕ,β is coherent and law-invariant and thus has a Kusuokarepresentation (Kusuoka (2001)). We give the representation in terms of spectral risk measures,which is equivalent to the Kusuoka representation. We derive this representation from thedual (16) based on the general approach elaborated in Pichler and Shapiro (2015). Proposition 5.6 (Spectral representation) . The spectral representation of a ϕ -divergence riskmeasure ρ ϕ,β for X ∈ L ψ is ρ ϕ,β ( X ) = sup σ ∫ σ ( u ) F − X ( u ) du , (23) where the supremum is taken over all non-decreasing σ : [ , ] → [ , ∞] with ∫ σ ( u ) du = and ∫ ϕ (cid:0) σ ( u ) (cid:1) du ≤ β. Remark . Every functional of the shape ρ σ ( X ) = ∫ σ ( u ) F − X ( u ) du , where σ : [ , ] → [ , ∞] is non-decreasing with ∫ σ ( u ) du = , is a coherent risk measureitself. It is called distortion risk measure in Pﬂug (2006) or spectral risk measure in Acerbi(2002).The spectral representation (23) is beneﬁcial to derive bounds as ρ σ ( X ) ≤ ρ ϕ,β ( X ) for all X ∈ L ψ . We provide an example next.

Example AV @ R bound) . For some ﬁxed α ∈ ( , ) we set σ α (·) = − α [ α, ] (·) . Theassociated distortion risk measure is ρ σ α ( X ) = ∫ σ ( u ) F − X ( u ) du = − α ∫ α F − X ( u ) du which is called Average Value-at-Risk and denoted as AV @ R α ( X ) . If ∫ ϕ ( σ α ( u )) du = ϕ ( ) α + ϕ (cid:18) − α (cid:19) ( − α ) ≤ β (24)holds, then σ α is contained in the set of functions, over which the supremum on the left sideof (23) is taken. We hence obtain AV @ R α ( X ) = ρ σ α ( X ) ≤ ρ ϕ,β ( X ) for all X ∈ L ψ for every α such that (24) is satisﬁed. Therefore, by inserting deﬁnition of ¯ α in (21), we have that AV @ R α ( X ) ≤ ρ ϕ,β ( X ) , α ≤ ¯ α. The latter inequality is of importance, as the Average Value-at-Risk is the most importantrisk measure in ﬁnance and in insurance. The inequality generalizes a corresponding inequalityfor the Entropic Value-at-Risk, cf. Ahmadi-Javid (2012a, Proposition 3.2).15

Characterization of the dual and applications

The Banach space L ψ is, by Proposition 4.9, not reﬂexive, in general. By James’s theorem, thereare continuous linear functionals, which do not attain their supremum on the closed unit ball.This section characterizes functionals of the dual, which attain their supremum on the closedunit ball. We characterize the optimal dual random variables in (16) by an explicit relation tooptimality of t and µ in the deﬁning equation (3). We further establish an explicit representationof the dual norm of k · k ϕ,β . We further specify conditions so that the optimal values in (3) canbe derived based on a system of equations. ϕ -divergence risk measures are eﬃciently incorporated into portfolio optimization problems.We demonstrate this property in an explicit example. To elaborate optimality inside of (3) and (16), we state some facts concerning the ‘derivatives’of the convex function ϕ and its conjugate ψ . Even though they are not necessarily diﬀerentiable,they have subderivatives ϕ ′ and ψ ′ (see Boţ et al. (2009, Theorem 2.3.12), Rockafellar (1970,Theorem 23.4)). These are functions, satisfying the equivalent relations ψ ′ ( x ) ( z − x ) ≤ ψ ( z ) − ψ ( x ) and ϕ ′ ( y ) ( z − y ) ≤ ϕ ( z ) − ϕ ( y ) , (25)and x ψ ′ ( x ) = ϕ ( ψ ′ ( x )) + ψ ( x ) and y ϕ ′ ( y ) = ϕ ( y ) + ψ ( ϕ ′ ( y )) (26)for all x , z ∈ R , y ≥ . The subderivatives ϕ ′ and ψ ′ are, in general, not unique. Nevertheless, theyare uniquely determined, except for at most countably many points. Any function satisfying (25)is non-decreasing and therefore measurable. Hence the system of equations = E ψ ′ (cid:18) Xt − µ (cid:19) , (27) β = E ϕ (cid:18) ψ ′ (cid:18) Xt − µ (cid:19) (cid:19) (28)is well speciﬁed.In what follows we demonstrate that solutions of the equations (27)–(28) characterize optimalsolutions t ∗ and µ ∗ in the deﬁning equation (3). They specify the random variable Z ∗ in the dualspace maximizing the functional E X Z among all Z ∈ M ϕ,β . Theorem 6.1.

Let be X ∈ L ψ , M ϕ,β as in (17) and ψ ′ satisfying (25) . Suppose µ ∗ ∈ R and t ∗ > solve of the characterizing equations (27) – (28) . Then they are the optimal values in (3) .Furthermore, the random variable Z ∗ : = ψ ′ (cid:18) Xt ∗ − µ ∗ (cid:19) is optimal in (16) , i.e., sup Z ∈ M ϕ,β E X Z = E X Z ∗ = t ∗ (cid:18) β + µ ∗ + E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) nd Z ∗ ∈ M ϕ,β .Proof. Let solutions t ∗ > , µ ∗ ∈ R of (27) and (28) be given. The assertion Z ∗ ∈ M ϕ,β isimmediate by the equations (27), (28) and the fact that ϕ ( x ) = ∞ holds for x < . Furthermore,by employing (26), we have that E (cid:18) Xt ∗ − µ ∗ (cid:19) Z ∗ − ϕ ( Z ∗ ) = E (cid:18) Xt ∗ − µ ∗ (cid:19) ψ ′ (cid:18) Xt ∗ − µ ∗ (cid:19) − ϕ (cid:18) ψ ′ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) = E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) . Hence by (27), (28) and Theorem 5.1 it follows that ρ ϕ,β ( X ) = sup Z ∈ M ϕ,β E X Z ≥ E X Z ∗ = t ∗ (cid:18) E X Z ∗ t ∗ − µ ∗ ( E Z ∗ − ) − ( E ϕ ( Z ∗ ) − β ) (cid:19) = t ∗ (cid:18) β + µ ∗ + E (cid:18) Xt ∗ − µ ∗ (cid:19) Z ∗ − ϕ ( Z ∗ ) (cid:19) = t ∗ (cid:18) β + µ ∗ + E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) ≥ ρ ϕ,β ( X ) . We therefore obtain E X Z ∗ = ρ ϕ,β ( X ) as well as ρ ϕ,β ( X ) = t ∗ (cid:18) β + µ ∗ + E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) . Thus µ ∗ , t ∗ and Z ∗ are optimal in (3) and (16), respectively. This is the assertion. (cid:3) Remark . Note that optimal values t ∗ and µ ∗ in (3) may exist, although the characterizingsystem (27)–(28) can not be solved. The existence of solutions depends on the speciﬁc choice ofthe subderivative ψ ′ .Nevertheless, further assumption on the random variable X and the function ψ can insuresolutions of the system of equations. We present the corresponding result in Section 6.3 below. This subsection addresses the dual norm k Z k ∗ ϕ,β : = sup k X k ϕ,β ≤ E X Z (29)of the ϕ -divergence norms given in (6). In what follows, we characterize (29) as an optimizationproblem in one variable, provided that ϕ satisﬁes the ∆ -condition (15).Note that ϕ ∈ ∆ implies E ϕ ( t | Z |) < ∞ for some t > ⇐⇒ E ϕ ( t | Z |) < ∞ for all t > as well as (cid:0) M ψ (cid:1) ∗ (cid:27) L ϕ and ( L ϕ ) ∗ (cid:27) L ψ (see Proposition 4.9). Thus the expression in (29) isﬁnite if and only if Z ∈ L ϕ .The following lemma states a speciﬁc transformation of a random variable Z ∈ L ϕ , whichwe use later to characterize the dual norm. 17 emma 6.3. Let ϕ ∈ ∆ and Z ∈ L ϕ . There exists a continuous function c Z : [ E | Z | , ∞) → [ , ] such that E max (cid:26) c Z ( λ ) , | Z | λ (cid:27) = (30) for all λ ∈ [ E | Z | , ∞) . If E ϕ (cid:16) | Z | E | Z | (cid:17) > β in addition, then there is a number λ ∗ ∈ ( E | Z | , ∞) such that E ϕ (cid:18) max (cid:26) c Z ( λ ∗ ) , | Z | λ ∗ (cid:27)(cid:19) = β is satisﬁed.Proof. To establish the assertion we recall the intermediate value theorem , which states that theequation f ( x ) = y has a solution x ∗ , if f is continuous and there are x , x such that f ( x ) ≤ y ≤ f ( x ) .Let Z ∈ L ϕ . If Z is constant, the function c Z ( λ ) : = satisﬁes (30). We therefore assumethat Z is non-constant and consider some ﬁxed λ ∈ ( E | Z | , ∞) . Setting f ( c ) : = E max n c , | Z | λ o we have that | f ( c ) − f ( c )| = (cid:12)(cid:12)(cid:12)(cid:12) E max (cid:26) c , | Z | λ (cid:27) − E max (cid:26) c , | Z | λ (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | c − c | for all c , c ∈ h ess inf (cid:16) | Z | λ (cid:17) , i . Thus f is Lipschitz continuous and hence continuous. Furtherwe have that f (cid:18) ess inf (cid:18) | Z | λ (cid:19) (cid:19) = E (cid:18) | Z | λ (cid:19) = E max (cid:26) ess inf (cid:18) | Z | λ (cid:19) , | Z | λ (cid:27) < ≤ E max (cid:26) , | Z | λ (cid:27) = f ( ) and thus, by employing the intermediate value theorem, f ( c ∗ ) = for some c ∗ ∈ (cid:16) ess inf (cid:16) | Z | λ (cid:17) , i .Hence (30) has for a solution c ∗ ( λ ) for every λ ∈ ( E | Z | , ∞) , which is unique as f increasesstrictly on (cid:16) ess inf (cid:16) | Z | λ (cid:17) , i . Therefore the function c Z : [ E | Z | , ∞) → [ , ] given by c Z ( λ ) : = ( ess inf (cid:16) | Z | E | Z | (cid:17) for λ = E | Z | c ∗ ( λ ) for λ ∈ ( E | Z | , ∞) is well deﬁned and satisﬁes (30) for every λ ∈ [ E | Z | , ∞) .To demonstrate the continuity of c Z , let λ ∈ ( E | Z | , ∞) and ε > . Without loss of generalitywe may assume that ε is suﬃciently small such that p = P (cid:16) | Z | λ ≤ c Z ( λ ) − ǫ (cid:17) > . Choosing δ ≤ λ ǫ p it follows that E max (cid:26) c Z ( λ ) − ε, | Z | λ (cid:27) ≤ E max (cid:26) c Z ( λ ) − ε, | Z | λ − δ (cid:27) ≤ λ λ − δ E max (cid:26) c Z ( λ ) − ε, | Z | λ (cid:27) ≤ λ λ − δ (cid:18) E max (cid:26) c Z ( λ ) , | Z | λ (cid:27) − ε p (cid:19) = λ λ − δ ( − ε p ) ≤ E max (cid:26) c Z ( λ ) + ε, | Z | λ (cid:27) ≥ E max (cid:26) c Z ( λ ) + ε, | Z | λ + δ (cid:27) ≥ λ λ + δ E max (cid:26) c Z ( λ ) + ε, | Z | λ (cid:27) ≥ λ λ + δ (cid:18) E max (cid:26) c Z ( λ ) , | Z | λ (cid:27) + ε p (cid:19) ≥ for all λ ∈ ( λ − δ, λ + δ ) . We thus get that c Z ( λ ) ∈ [ c Z ( λ )− ǫ, c Z ( λ ) + ǫ ] for all λ ∈ ( λ − δ, λ + δ ) ,by the intermediate value theorem. This establishes the continuity of c Z on ( E | Z | , ∞) . The (rightside) continuity in λ = E | Z | follows from the fact that E max (cid:26) ess inf (cid:18) | Z | E | Z | (cid:19) + ε, | Z | E | Z | (cid:27) > holds for every ǫ > . This demonstrates the ﬁrst part of the assertion.For the second we assume E ϕ (cid:16) | Z | E | Z | (cid:17) > β and set g ( λ ) : = E ϕ (cid:16) max n c Z ( λ ) , | Z | λ o(cid:17) . By E max (cid:26) c Z ( λ ) , | Z | λ (cid:27) = for all λ ∈ ( E | Z | , ∞) , we observe that max n c Z ( λ ) , | Z | λ o → almost surely, for λ → ∞ . It is hence suﬃcient to showthat g is continuous, as then the assertion follows from = ϕ ( ) = lim λ →∞ g ( λ ) < β < E ϕ (cid:18) | Z | E | Z | (cid:19) = g ( E | Z |) and the intermediate value theorem. Let ( λ n ) n ∈ N ⊂ ( E | Z | , ∞) such that λ n → λ ∈ [ E | Z | , ∞) .Choosing a number M ≥ such that ϕ is non-decreasing and non-negative for all x ≥ M , wehave the estimation (cid:12)(cid:12)(cid:12)(cid:12) ϕ (cid:18) max (cid:26) c Z ( λ ) , | Z | λ (cid:27)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup x ∈[ , M ] | ϕ ( x )| + ϕ (cid:18) max (cid:26) M , | Z | λ (cid:27)(cid:19) ≤ sup x ∈[ , M ] | ϕ ( x )| + ϕ (cid:18) max (cid:26) M , | Z | E | Z | (cid:27)(cid:19) (31)for all λ ∈ [ E | Z | , ∞) . As (31) is integrable we can interchange limit and expectation byLebesgue’s Dominated convergence theorem, and thus get g ( λ ) = E ϕ (cid:18) max (cid:26) c Z ( λ ) , | Z | λ (cid:27)(cid:19) = E (cid:18) lim n →∞ ϕ (cid:18) max (cid:26) c Z ( λ n ) , | Z | λ n (cid:27)(cid:19) (cid:19) = lim n →∞ E ϕ (cid:18) max (cid:26) c Z ( λ n ) , | Z | λ n (cid:27)(cid:19) = lim n →∞ g ( λ n ) by Proposition 2.3 (i) and continuity of c Z . This demonstrates continuity of g and consequentlythe assertion. (cid:3) Theorem 6.4.

For ϕ ∈ ∆ and Z ∈ L ϕ it holds that k Z k ∗ ϕ,β = inf (cid:26) λ ≥ E | Z | : E ϕ (cid:18) max (cid:26) c Z ( λ ) , | Z | λ (cid:27)(cid:19) ≤ β (cid:27) , where c Z is the function in Lemma 6.3.Proof. Let be Z ∈ L ϕ and M ϕ,β as in (17). If E ϕ (cid:16) | Z | E | Z | (cid:17) ≤ β holds, we have that | Z | E | Z | ∈ M ϕ,β and therefore E X Z ≤ E | Z | E | X | | Z | E | Z | ≤ E | Z | sup Y ∈ M ϕ,β E | X | Y = E | Z | k X k ϕ,β by Theorem 5.1. Hence it holds k Z k ∗ ϕ,β ≤ E | Z | . Conversely, by (5), we get that k sign ( Z )k ϕ,β ≤ and thus k Z k ∗ ϕ,β ≥ E | Z | , as E Z sign ( Z ) = E | Z | ≥ E | Z | k sign ( Z )k ϕ,β . We therefore obtain k Z k ∗ ϕ,β = E | Z | .Now assume E ϕ (cid:16) | Z | E Z (cid:17) > β . Employing Lemma 6.3 we get a number λ ∗ ∈ ( E | Z | , ∞) suchthat E max (cid:26) c Z ( λ ∗ ) , | Z | λ ∗ (cid:27) = and E ϕ (cid:18) max (cid:26) c Z ( λ ∗ ) , | Z | λ ∗ (cid:27)(cid:19) = β (32)holds. Setting Z ∗ : = max n c Z ( λ ∗ ) , | Z | λ ∗ o , and observing Z ∗ ∈ M ϕ,β as well as | Z | λ ∗ ≤ Z ∗ , it followsfrom Theorem 5.1 that E X Z λ ∗ ≤ E | X | | Z | λ ∗ ≤ E | X | Z ∗ ≤ k X k ϕ,β for every X ∈ L ψ . We therefore conclude k Z k ∗ ϕ,β ≤ λ ∗ .To establish the converse inequality, we consider X ∗ : = max n , ϕ ′ (cid:16) | Z | λ ∗ (cid:17) − ϕ ′ ( c Z ( λ ∗ )) o ,where ϕ ′ corresponds to the function in (25). Invoking (25) and (26), we obtain that E ψ (cid:18) ϕ ′ (cid:18) | Z | λ ∗ (cid:19) (cid:19) = E | Z | λ ∗ ϕ ′ (cid:18) | Z | λ ∗ (cid:19) − ϕ (cid:18) | Z | λ ∗ (cid:19) ≤ E ϕ (cid:18) | Z | λ ∗ (cid:19) − ϕ (cid:18) | Z | λ ∗ (cid:19) < ∞ as Z ∈ L ϕ and ϕ ∈ ∆ . Thus ϕ ′ (cid:16) | Z | λ ∗ (cid:17) ∈ L ψ and consequently X ∗ ∈ L ψ . Further, as ϕ ′ isnon-decreasing, we observe that X ∗ + ϕ ′ ( c Z ( λ ∗ )) = max (cid:26) ϕ ′ ( c Z ( λ ∗ )) , ϕ ′ (cid:18) | Z | λ ∗ (cid:19) (cid:27) = ϕ ′ (cid:18) max (cid:26) c Z ( λ ∗ ) , | Z | λ ∗ (cid:27)(cid:19) = ϕ ′ ( Z ∗ ) and hence E ( X ∗ + ϕ ′ ( c Z ( λ ∗ ))) Z ∗ = E ϕ ′ ( Z ∗ ) Z ∗ = E ψ ( ϕ ′ ( Z ∗ )) + ϕ ( Z ∗ ) = E ψ (( X ∗ + ϕ ′ ( c Z ( λ ∗ ))) + ϕ ( Z ∗ )

20y (26). Employing this as well as (18) and (32), we obtain ρ ϕ,β ( X ∗ ) ≥ E X ∗ Z ∗ = − ϕ ′ ( c Z ( λ ∗ )) + β + E (( X ∗ + ϕ ′ ( c Z ( λ ∗ )) Z ∗ − ϕ ( Z ∗ )) = − ϕ ′ ( c Z ( λ ∗ )) + β + E ψ ( X ∗ + ϕ ′ ( c Z ( λ ∗ )) ≥ ρ ϕ,β ( X ∗ ) and therefore ρ ϕ,β ( X ∗ ) = E X ∗ Z ∗ . Observing that Z ∗ equals | Z | λ ∗ on the set where X ∗ diﬀersfrom , we ﬁnally get that E sign ( Z ) X ∗ Z λ ∗ = E X ∗ | Z | λ ∗ = E X ∗ Z ∗ = ρ ϕ,β ( X ∗ ) = k X ∗ k ϕ,β = k sign ( Z ) X ∗ k ϕ,β as X ∗ is non-negative. This establishes k Z k ∗ ϕ,β ≥ λ ∗ and thus the theorem. (cid:3) For completeness we provide conditions to guarantee that the system (27)–(28) is solvable. Thesolutions t ∗ and µ ∗ identify the optimal solution in the initial problem (3). This is of importancein numerical evaluations of ρ ϕ,β ( X ) . Theorem 6.5.

Let be X ∈ M ψ , X ≥ and ϕ ∈ ∆ . Further suppose there are optimal values t ∗ > and µ ∗ ∈ R inside of (3) (i.e., P ( X = ess sup ( X )) < − ¯ α by Proposition 5.5). If ψ isdiﬀerentiable, then t ∗ and µ ∗ solve the equations (27) and (28) for the normal derivative ψ ′ . If X is continuously distributed, then t ∗ and µ ∗ solve the equations (27) and (28) for any subderivative ψ ′ satisfying (25) .Proof. Let non-negative X ∈ M ψ and minimizers t ∗ > and µ ∗ ∈ R inside of in (3) be given.By the non-negativity of X we have that ρ ϕ,β ( X ) = k X k ϕ,β . Therefore it exists a random variable Z ∈ (cid:0) M ψ (cid:1) ∗ = L ϕ such that k Z k ∗ ϕ,β = and E X Z = k X k ϕ,β = ρ ϕ,β ( X ) by the Hahn-Banach theorem (Luenberger (1969, p. 112 Corollary 2)). As we have shown in theproof of Theorem 6.4, there is Z ∗ ∈ L ϕ with Z ∗ ∈ M ϕ,β and | Z | ≤ Z ∗ . Therefore, as X ≥ ,we have that E X Z ≤ E X Z ∗ . Conversely, it holds that E X Z ∗ ≤ k X k ϕ,β = E X Z , as Z ∗ isfeasible inside of M ϕ,β , from which we conclude E X Z = E X Z ∗ . Applying the Fenchel–Younginequality (2) we obtain E X Z ∗ ≤ E X Z ∗ + t ∗ ( β − E ϕ ( Z ∗ )) + t ∗ µ ∗ ( − E Z ∗ ) = t ∗ (cid:18) β + µ ∗ + E (cid:18) | X | t ∗ − µ ∗ (cid:19) Z ∗ − ϕ ( Z ∗ ) (cid:19) ≤ t ∗ (cid:18) β + µ ∗ + E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) = k X k ϕ,β . By E X Z ∗ = k X k ϕ,β it follows that neither of the upper inequalties is strict and hence E ϕ ( Z ∗ ) = β as well as E (cid:18) Xt ∗ − µ ∗ (cid:19) Z ∗ − ϕ ( Z ∗ ) = E ψ (cid:18) Xt ∗ − µ ∗ (cid:19) . (33)21f ψ is diﬀerentiable, the only function establishing equality inside of Fenchel–Young in-equality (2) is the derivative ψ ′ (see (26)). In any other case it holds strict inequality. Henceby (33), we have that Z ∗ = ψ ′ (cid:0) Xt ∗ − µ ∗ (cid:1) almost surely and therefore = E Z ∗ = E ψ ′ (cid:18) Xt ∗ − µ ∗ (cid:19) and β = E ϕ ( Z ∗ ) = E ϕ (cid:18) ψ ′ (cid:18) Xt ∗ − µ ∗ (cid:19) (cid:19) . Thus t ∗ and µ ∗ solve the equations (27), (28).Now assume X is continuously distributed. Then the random variables ψ ′ (cid:0) Xt ∗ − µ ∗ (cid:1) coincidealmost surely, for every subderivative ψ ′ of ψ . This follows from the fact that the subderivatives ψ ′ of ψ are uniquely determined, apart from at most countably many points. Furthermore, by thesame argument as above, we have that Z ∗ = ψ ′ (cid:0) Xt ∗ − µ ∗ (cid:1) almost surely and thus the assertion. (cid:3) In what follows we highlight the beneﬁts of ϕ -divergence risk measures for a problem in opti-mizing a portfolio (cf. also Rockafellar et al. (2014)). To this end set W : = ( w = ( w , . . . , w n ) ∈ R n : w i ≥ and n Õ i = w i = ) and consider random variables X , . . . , X n ∈ L ψ . X i is the loss of the i-th asset and W constitutesall possible portfolio allocations. By denoting X w : = w X + · · · + w n X n the associatedoptimization problem is min w ∈ W ρ ϕ,β ( X w ) = min w ∈ W inf µ ∈ R , t > t (cid:18) β + µ + E ψ (cid:18) X w t − µ (cid:19) (cid:19) , which determines the portfolio allocation with minimal risk based on the risk measure ρ ϕ,β . Onemay restate this expression as min w ∈ W ρ ϕ,β ( X w ) = min w ∈ W min µ ∈ R , t > t (cid:18) β + µ + E ψ (cid:18) X w t − µ (cid:19) (cid:19) = min w ∈ W ,µ ∈ R , t > t (cid:18) β + µ + E ψ (cid:18) X w t − µ (cid:19) (cid:19) . (34)The striking beneﬁt in (34) is that it is suﬃcient to execute a single minimization problem withonly two additional variables instead of two nested minimization problems when employing (20).This reduces the complexity of the problem signiﬁcantly. Similar results are available forHaezendonck–Goovaerts risk measures in Bellini and Rosazza Gianin (2008b) as for AverageValue-at-Risk in Rockafellar and Uryasev (2002). Coherent risk measures are of fundamental importance in mathematical ﬁnance. They constituteconvex functionals on appropriate Banach spaces for which the entire and rich theory of convexanalysis and convex duality applies. 22his paper addresses a speciﬁc risk functional based on ϕ -divergence. The ϕ -divergence isa non-symmetric distance, it is used to quantify aberrations from a given probability measure. ϕ -divergence generalizes Kullback–Leibler divergence, which is nowadays exhaustively used indata science.We characterize the corresponding Banach space in detail and elaborate the dual norm. Thespace is an Orlicz space and, in general, not reﬂexive.The speciﬁc form of the ϕ -divergence risk measure allows a rich variety of equivalent expres-sions. They can be employed mutually to exploit the speciﬁc properties in given applications.We also exemplify the properties for a typical problem in mathematical ﬁnance. References

C. Acerbi. Spectral measures of risk: A coherent representation of subjective risk aversion.

Journal of Banking & Finance , 26:1505–1518, 2002. doi:10.1016/S0378-4266(02)00281-9.15A. Ahmadi-Javid. Entropic Value-at-Risk: A new coherent risk measure.

Journal of OptimizationTheory and Applications , 155(3):1105–1123, 2012a. doi:10.1007/s10957-011-9968-2. 1, 4,11, 15A. Ahmadi-Javid. Addendum to: Entropic Value-at-Risk: A new coherent risk mea-sure.

Journal of Optimization Theory and Applications , 155(3):1124–1128, 3 2012b.doi:10.1007/s10957-012-0014-9. 4A. Ahmadi-Javid and A. Pichler. An analytical study of norms and Banach spaces inducedby the entropic value-at-risk.

Mathematics and Financial Economics , 11(4):527–550, 2017.doi:10.1007/s11579-017-0197-9. 2, 11P. Artzner, F. Delbaen, and D. Heath. Thinking coherently.

Risk , 10:68–71, 1997. 2P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent Measures of Risk.

MathematicalFinance , 9:203–228, 1999. doi:10.1111/1467-9965.00068. 2F. Bellini and E. Rosazza Gianin. On Haezendonck risk measures.

Journal of Banking & Finance ,32(6):986–994, 2008a. doi:10.1016/j.jbankﬁn.2007.07.007. 13F. Bellini and E. Rosazza Gianin. Optimal portfolios with Haezendonck risk measures.

Statistics& Decisions , 26, 01 2008b. doi:10.1524/stnd.2008.0915. 22F. Bellini and E. Rosazza Gianin. Haezendonck–Goovaerts risk measures and Or-licz quantiles.

Insurance: Mathematics and Economics , 51(1):107–114, 2012.doi:10.1016/j.insmatheco.2012.03.005. 1, 13F. Bellini, B. Klar, A. Müller, and E. Rosazza Gianin. Generalized quantilesas risk measures.

Insurance: Mathematics and Economics , 54:41–48, 2014.doi:10.1016/j.insmatheco.2013.10.015. 1 23. I. Boţ, S.-M. Grad, and G. Wanka.

Duality in Vector Optimization . Springer, 2009.doi:10.1007/978-3-642-02886-1. 16T. Breuer and I. Csiszár. Measuring distribution model risk.

Mathematical Finance , 2013a.doi:10.1111/maﬁ.12050. 2, 3T. Breuer and I. Csiszár. Systematic stress tests with entropic plausibility constraints.

Journal ofBanking & Finance , 37(5):1552–1559, 2013b. doi:10.1016/j.jbankﬁn.2012.04.013. 2, 3P. Cheridito and T. Li. Dual characterization of properties of risk measures on Orlicz hearts.

Mathematics and Financial Economics , 2(1):29–55, 2008. doi:10.1007/s11579-008-0013-7.1P. Cheridito and T. Li. Risk measures on Orlicz hearts.

Mathematical Finance , 19(2):189–214,2009. doi:10.1111/j.1467-9965.2009.00364.x. 1F. Delbaen. Remark on the paper “Entropic Value-at-Risk: A new coherent risk measure” byAmir Ahmadi-Javid. In P. Barrieu, editor,

Risk and Stochastics . World Scientiﬁc, 2015. ISBN978-1-78634-194-5. doi:10.1142/q0057. 2F. Delbaen and K. Owari. Convex functions on dual Orlicz spaces.

Positivity , 23(5):1051–1064,2019. doi:10.1007/s11117-019-00651-x. 1M. Goovaerts, D. Linders, K. V. Weert, and F. Tank. On the interplay between distortion, meanvalue and the Haezendonck-Goovaerts risk measures.

Insurance: Mathematics and Economics ,51:10–18, 2012. doi:10.1016/j.insmatheco.2012.02.012. 13T. Kalmes and A. Pichler. On Banach spaces of vector-valued random variables and their dualsmotivated by risk measures.

Banach Journal of Mathematical Analysis , 12(4):773–807, 2018.doi:10.1215/17358787-2017-0026. 7M. A. Krasnosel’skii and Y. B. Rutickii.

Convex functions and Orlicz spaces . NoordhoﬀGroningen, 1961. 9S. Kusuoka. On law invariant coherent risk measures. In

Advances in mathematical economics ,volume 3, chapter 4, pages 83–95. Springer, 2001. doi:10.1007/978-4-431-67891-5. 15D. G. Luenberger.

Optimization by vector space methods . Decision and control. Wiley, NewYork, NY, 1969. URL https://cds.cern.ch/record/104246 . 12, 21W. Ogryczak and A. Ruszczyński. From stochastic dominance to mean-risk models: Semide-viations as risk measures.

European Journal of Operational Research , 116:33–50, 1999.doi:10.1016/S0377-2217(98)00167-2. 1W. Ogryczak and A. Ruszczyński. Dual stochastic dominance and related mean-risk models.

SIAM Journal on Optimization , 13(1):60–78, 2002. doi:10.1137/S1052623400375075. 1G. Ch. Pﬂug. On distortion functionals.

Statistics and Risk Modeling (formerly: Statistics andDecisions) , 24:45–60, 2006. doi:10.1524/stnd.2006.24.1.45. 1524. Pichler. The natural Banach space for version independent risk measures.

Insurance: Mathe-matics and Economics , 53(2):405–415, 2013. doi:10.1016/j.insmatheco.2013.07.005. 7A. Pichler. A quantitative comparison of risk measures.

Annals of Operations Research , 254(1):251–275, 2017. doi:10.1007/s10479-017-2397-3. 7A. Pichler and R. Schlotter. Entropy based risk measures.

European Jour-nal of Operational Research , 2018. doi:10.1016/j.ejor.2019.01.016. URL https://arxiv.org/abs/1801.07220 . 2A. Pichler and A. Shapiro. Minimal representations of insurance prices.

Insurance: Mathematicsand Economics , 62:184–193, 2015. doi:10.1016/j.insmatheco.2015.03.011. 15L. Pick, A. Kufner, O. John, and S. Fučík.

Function Spaces . De Gruyter Series in NonlinearAnalysis and Applications 14. Walter de Gruyter & Co., Berlin, second and extended edition,2013. URL http://books.google.com/books?id=KXt6BV9G5k4C . 8, 10, 11R. T. Rockafellar.

Convex Analysis . Princeton University Press, 1970. ISBN 978-1-4008-7317-3.URL . 3, 16R. T. Rockafellar. Integral functionals, normal integrands and measurable selections. In

Nonlinear operators and the calculus of variations , pages 157–207. Springer, 1976.doi:10.1007/BFb0079944. 11R. T. Rockafellar and J. O. Royset. Random variables, monotone relations, and convex analysis.

Mathematical Programming , 148(1-2):297–331, 2014. doi:10.1007/s10107-014-0801-1. 2R. T. Rockafellar and J. O. Royset. Measures of residual risk with connections to regression, risktracking, surrogate models, and ambiguity.

SIAM Journal on Optimization , 25(2):1179–1208,2015. doi:10.1137/151003271. 2R. T. Rockafellar and J. O. Royset. Superquantile/ CVaR risk measures: second-order theory.

Annals of Operations Research , 262(1):3–28, 2016. doi:10.1007/s10479-016-2129-0. 2R. T. Rockafellar and S. Uryasev. Optimization of Conditional Value-at-Risk.

Journal of Risk , 2(3):21–41, 2000. doi:10.21314/JOR.2000.038. 1R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.

Journalof Banking and Finance , 26:1443–1471, 2002. doi:10.1016/S0378-4266(02)00271-6. 1, 22R. T. Rockafellar and S. Uryasev. The fundamental risk quadrangle in risk management, opti-mization and statistical estimation.

Surveys in Operations Research and Management Science ,18(1-2):33–53, 2013. doi:10.1016/j.sorms.2013.03.001. 2R. T. Rockafellar, J. O. Royset, and S. I. Miranda. Superquantile regression with applicationsto buﬀered reliability, uncertainty quantiﬁcation, and conditional value-at-risk.

EuropeanJournal of Operational Research , 234(1):140–154, 2014. doi:10.1016/j.ejor.2013.10.046. 2225. Wang and J. Dhaene. Comonotonicity, correlation order and premium prin-ciples.