Comonotonic measures of multivariate risks
CCOMONOTONIC MEASURES OF MULTIVARIATE RISKS
IVAR EKELAND † ALFRED GALICHON § MARC HENRY ‡ Abstract.
We propose a multivariate extension of a well-known characterization byS. Kusuoka of regular and coherent risk measures as maximal correlation functionals.This involves an extension of the notion of comonotonicity to random vectors throughgeneralized quantile functions. Moreover, we propose to replace the current law invari-ance, subadditivity and comonotonicity axioms by an equivalent property we call strongcoherence and that we argue has more natural economic interpretation. Finally, we refor-mulate the computation of regular and coherent risk measures as an optimal transportationproblem, for which we provide an algorithm and implementation.
Keywords : regular risk measures, coherent risk measures, comonotonicity, maximal correlation, optimaltransportation, strongly coherent risk measures.
MSC 2000 subject classification : 91B06, 91B30, 90C08
Date a r X i v : . [ ec on . T H ] F e b IVAR EKELAND † ALFRED GALICHON § MARC HENRY ‡ Introduction
The notion of coherent risk measure was proposed by Artzner, Delbaen, Eber and Heathin [1] as a set of axioms to be verified by a real-valued measure of the riskiness of an exposure.In addition to monotonicity, positive homogeneity and translation invariance, the proposedcoherency axioms include subadditivity, which is loosely associated with hedging. Giventhis interpretation, it is natural to require the risk measure to be additive on the subsetsof risky exposures that are comonotonic , as this situation corresponds to the worse-casescenario for the correlation of the risks. In [15], Kusuoka showed the remarkable result thatlaw invariant coherent risk measures that are also comonotonic additive are defined by theintegral of the quantile function with respect to a positive measure, a family that includesExpected Shortfall (also known as Conditional Value at Risk, or Expected Tail Loss).The main drawback of this formulation is that it does not properly handle the case whenthe num´eraires in which the risky payoffs are labeled are not perfect substitutes. Thissituation is commonly met in Finance. In a two-country economy with floating exchangerates, the fact that claims on payoffs in different currencies are not perfectly substitutableis known as the
Siegel paradox ; in the study of the term structure of interest rates, thefact that various maturities are (not) perfect substitutes is called the (failure of the) pureexpectation hypothesis . The technical difficulty impeding a generalization to the case of amultivariate risk measure is that the traditional definition of comonotonicity relies on theorder in R . When dealing with portfolios of risk that are non perfectly substituable, asJouini, Meddeb and Touzi did in [13] for coherent risk measures, and R¨uschendorf in [17]for law invariant convex risk measures, the right notion of multivariate comonotonicity isnot immediately apparent.The present work circumvents these drawbacks to generalize Kusuoka’s result to mul-tivariate risk portfolios, and proposes a simplifying reformulation of the axioms with firmdecision theoretic foundations. First, we propose an alternative axiom called strong coher-ence , which is equivalent to the axioms in [15] and which, unlike the latter, extends to themultivariate setting. We then make use of a variational characterization of Kusuoka’s ax-ioms and representation in order to generalize his results to the multivariate case. We show OMONOTONIC MEASURES OF MULTIVARIATE RISKS 3 that multivariate risk measures that satisfy strong coherence have the same representationas in [15], which we discuss further below.The work is organized as follows. The first section motivates a new notion called strongcoherence which is shown to be intimately related to existing risk measures axioms, yetappears to be more natural. The second section shows how the concept of comonotonicregular risk measures can be extended to the case of multivariate risks, by introducing aproper generalization of the notion of comonotonicity and giving a representation theorem.The third section discusses in depth the relation with Optimal Transportation Theory, andshows important examples of actual computations.
Notations and conventions.
Let (Ω , F , P ) be a probability space, which is standard inthe terminology of [14], that is P is nonatomic and L (Ω , F , P ) is separable. Let X : Ω → R d be a random vector; we denote the distribution law of X by L X , hence L X = X P ,where X P := P X − denotes the push-forward of probability measure P by X . The equidistribution class of X is the set of random vectors with distribution with respect to P equal to L X (reference to P will be implicit unless stated otherwise). As explained in theappendix, essentially one element in the equidistribution class of X has the property of beingthe gradient of a convex function; this random element is called the (generalized) quantilefunction associated with the distribution L X and denoted by Q X (in dimension 1, this isthe quantile function of distribution L X in the usual sense). We denote by M ( L , L (cid:48) ) the setof probability measures on R d × R d with marginals L and L (cid:48) . We call L d ( P ) (abbreviated in L d ) the equivalence class of F -measurable functions Ω → R d with a finite second momentmodulo P -negligible events. We call P ( R d ) the set of probability on R d with finite secondmoment. Finally, for two elements X, Y of L d , we write X ∼ Y to indicate equality indistribution, that is L X = L Y . We also write X ∼ L X . Define c . l . s . c . ( R d ) as the classof convex lower semi-continuous functions on R d , and the Legendre-Fenchel conjugate of V ∈ c . l . s . c . ( R d ) as V ∗ ( x ) = sup y ∈ R d [ x · y − V ( y )]. In all that follows, the dot “ · ” willdenote the standard scalar product in R d or L d . M d ( R ) denotes the set of d × d matrices,and O d ( R ) the orthogonal group in dimension d . For M ∈ M d ( R ), M T denotes the matrixtranspose of M . For a function V : R d → R differentiable at x , we denote ∇ V ( x ) the IVAR EKELAND † ALFRED GALICHON § MARC HENRY ‡ gradient of V at x ; this is the vector (cid:16) ∂V ( x ) ∂x , ..., ∂V ( x ) ∂x d (cid:17) ∈ R d . When V is twice differentiableat x , we denote D V the Hessian matrix of V that is the matrix (cid:16) ∂ V ( x ) ∂x i ∂x j (cid:17) ≤ i,j ≤ d . ByAleksandrov’s theorem, a convex function is (Lebesgue-) almost everywhere differentiableon the interior of its domain (see [22], pp. 58–59), so ∇ V and D V exist almost everywhere.For a functional Φ defined on a Banach space, we denote D Φ its Fr´echet derivative.1.
Strong coherence: a natural axiomatic characterization
In this section we advocate a very simple axiomatic setting, called strong coherence which will be shown to be equivalent to the more classical axiomatic framework describedin the next section. We argue that this axiom has more intuitive appeal than the classical(equivalent) axioms.1.1.
Motivation: Structure Neutrality.
The regulating instances of the banking indus-try are confronted with the problem of imposing rules to the banks to determine the amountof regulatory capital they should budget to cover their risky exposure. A notable exampleof such a rule is the Value-at-Risk, imposed by the Basel II committee, but a number ofcompeting rules have been proposed. We call X ∈ L d the vector of random losses of agiven bank. Note that contrary to a convention often adopted in the literature, we chose toaccount positively for net losses: X is a vector of effective losses. Also note that we havesupposed that the risk is multivariate, which means that there are multiple num´eraires,which, depending on the nature of the problem, can be several assets, several term matu-rities, or several non-monetary risks of different nature. We suppose that these multiplenum´eraires cannot be easily exchanged into one another: the problem is intrinsiquely multi-variate . This could be the case if the firm (or the regulator) is unable or unwilling to definea monetary equivalent for the various dimensions of its risks. For instance, an oil companyis likely to be unable to estimate a dollar amount to price its environmental risk; similarly, In this paper we have chosen to restrict ourselves to the case where risks are in L ( R d ) for notationalconvenience, but all results in the paper carry without difficulty over to the case where the risks are in in L p ( R d ) for p ∈ (1 , + ∞ ). OMONOTONIC MEASURES OF MULTIVARIATE RISKS 5 a pharmaceutical company may be unwilling to give a monetary estimate for the healthhazard its product carry.To a random vector of losses X one associates a number (cid:37) ( X ) which measures the intensityof the risk incurred. The unit in which (cid:37) is to be thought of as some extra currency unity,or alternatively a non-monetary score; it is not assumed to be one of the monetary unitsassociated with the various dimensions of the vector of the risks. This score is used byinvestors to compare the risks of two companies, or by regulators to set limits to riskyexposures for regulated firms. An important desirable feature of the rule proposed by theregulator is to avoid regulatory arbitrage. Here, a regulatory arbitrage would be possibleif the firms could split their risk into several different subsidiaries S i , i = 1 , ..., N withindependent legal existence, so that the the shareholder’s economic risk remained the same X = X + ... + X N , but such that the amount of the shareholder’s capital which is requiredto be budgeted to cover their risk were strictly inferior after the split, namely such that (cid:37) ( X ) > (cid:37) ( X ) + ... + (cid:37) ( X N ). To avoid this, we shall impose the requirement of subadditivity ,that is (cid:37) ( X + ... + X N ) ≤ (cid:37) ( X ) + ... + (cid:37) ( X N )for all possible dependent risk exposures ( X , ..., X N ) ∈ ( L d ) N . We now argue that theregulator is only interested in the amount and the intensity of the risk, not in its operationalnature: the capital budgeted should be the same for a contingent loss of 1% of the totalcapital at risk no matter how the loss occurred (whether on the foreign exchange market, thestock market, the credit market, etc.) This translates mathematically into the requirementthat the regulatory capital to budget should only depend on the distribution of the risk X ,that is, the rule should satisfy the law invariance property: Definition 1.
A functional (cid:37) : L → R is called law-invariant if (cid:37) ( X ) = (cid:37) ( Y ) when X ∼ Y ,where ∼ denotes equality in distribution. By combining together subadditivity and law invariance, we get the natural requirementfor the capital budgeting rule, that (cid:37) ( ˜ X + ... + ˜ X N ) ≤ (cid:37) ( X ) + ... + (cid:37) ( X N ) for all X , ˜ X in( L d ) N such that X i ∼ ˜ X i for all i = 1 , ..., N . However, in order to prevent giving a premiumto conglomerates, and to avoid imposing an overconservative rule to the regulated firms, IVAR EKELAND † ALFRED GALICHON § MARC HENRY ‡ one is led to impose the inequality to be sharp and pose the structure neutrality axiom (cid:37) ( X ) + ... + (cid:37) ( X N ) = sup ˜ X i ∼ X i (cid:37) ( ˜ X + ... + ˜ X N )This requirement is notably failed by the Value-at-Risk, which leads to the fact that theValue-at-Risk as a capital budgeting rule is not neutral to the structure of the firm . Thisresult should be read in the perspective of the corporate finance literature on the optimalstructure of the firm, originating in the celebrated Modigliani-Miller theorem, according towhich the value of the firm does not depend on the structure of its capital. This point isexplained in detail in [12], where an explicit construction is provided. We introduce theaxiom of strong coherence to be satisfied by a measure of the riskiness of a portfolio of riskexposures (potential losses) X ∈ L d . Definition 2 (Strong coherence) . For µ ∈ P ( R d ) , a functional (cid:37) : L d → R is calleda strongly coherent risk measure if (i) it is convex continuous, and (ii) it is structureneutral : for all X, Y ∈ L d , (cid:37) ( X ) + (cid:37) ( Y ) = sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X ; ˜ Y ∼ Y (cid:111) . The convexity axiom can be justified by a risk aversion principle: in general, one shouldprefer to diversify risk. The structure neutrality axiom, being defined as a supremum overall correlation structures, can be interpreted as a provision against worst-case scenarios,and may be seen as unduly conservative. However, this axiom is no more conservative thanthe set of axioms defining a regular coherent risk measure as we shall see.As we shall see also, strongly coherent risk measures satisfy all the classical axioms ofcoherent risk measures (recalled in definition 4 below) let alone monotonicity, for whichthe multivariate extension is not obvious. In particular, these measures satisfy positivehomogeneity and translation invariance. They also satisfy law invariance, which can beseen by taking Y = 0 in the definition above.We now show that strongly coherent risk measures are represented by maximal correlationfunctionals with respect to a given random vector or scenario. OMONOTONIC MEASURES OF MULTIVARIATE RISKS 7
Characterization of strongly coherent risk measures.
We are now going to showthat the strong coherence property essentially characterizes a class of risk measures knownas maximal correlation risk measures , which we shall first recall the definition of.1.2.1.
Maximal correlation measures.
We first define maximal correlation risk measures (inthe terminology of R¨uschendorf who introduced them in the multivariate case, see e.g.[17]). These measures will generalize the variational formulation for coherent regular riskmeasures given in (2.1) below.
Definition 3 (Maximal correlation measures) . A functional (cid:37) µ : L d → R is called a maxi-mal correlation risk measure with respect to a baseline distribution µ if for all X ∈ L d , (cid:37) µ ( X ) := sup (cid:110) E [ X · ˜ U ] : ˜ U ∼ µ (cid:111) . Our notion of maximal correlation risk measure is essentially the same as R¨uschendorf’s,with a few minor variants: R¨uschendorf defines his measures on L ∞ d instead of L d , andimposes the extra requirements U i ≥ E µ [ U i ] = 1 for i = 1 , ..., d , which we do notimpose for now. Remark 1 (Geometric interpretation) . The maximum correlation measure with respect tomeasure µ is the support function of the equidistribution class of µ . Example 1 (Multivariate Expected Shortfall) . An interesting example of univariate riskmeasure within the class of maximal correlation risk measures is the expected shortfall,also known as conditional value at risk. This risk measure can be generalized to the mul-tivariate setting by defining the α -expected shortfall of a risk exposure X as the maximalcorrelation measure when the baseline risk U is a Bernoulli random vector (i.e. its distri-bution L U is determined by U = (1 /α, . . . , /α ) T with probability α and with probability − α ). In such case, one can easily check that if L X is absolutely continuous, then defin-ing W ( x ) = max ( (cid:80) di =1 x i − c, , with c given by requirement P r ( (cid:80) di =1 x i ≥ c ) = α , itfollows that W is convex and ∇ W exists L X almost everywhere and pushes L X to L U as in proposition 7; therefore the maximal correlation measure is given in this case by IVAR EKELAND † ALFRED GALICHON § MARC HENRY ‡ E (cid:104)(cid:16)(cid:80) di =1 X i (cid:17) { (cid:80) di =1 X i ≥ c } (cid:105) . In other words, the maximum correlation measure in thisexample is the (univariate) α -expected shortfall for Y = (cid:80) di =1 X i . Example 2.
With a more complex baseline risk, other important examples where explicitor numerical computation is possible include the cases when 1) the baseline risk and the riskto be measured are both Gaussian, or 2) the baseline risk is uniform on [0 , d and the risk tobe measured has a discrete distribution. Both these cases are treated in detail in Section 4. Let us first recall the following lemma, which emphasizes the symmetry between the rolesplayed by the equivalence class of X and U in the definition above. Lemma 1.
For any choice of U ∼ µ , with µ ∈ P ( R d ) , one has (cid:37) µ ( X ) = sup (cid:110) E [ ˜ X · U ] : ˜ X ∼ X (cid:111) , and U is called the baseline risk associated with (cid:37) µ . It follows that (cid:37) µ is law invariant.Proof. See (2.12) in [17]. (cid:3)
Characterization.
We now turn to our first main result, which is a characterizationof strongly coherent risk measures. We first prove a useful intermediate characterizationin proposition 1 below. We shall use Lemma A.4 from [14], which we quote here for thereader’s convenience. Denote by A the set of bimeasurable bijections σ from (Ω , A , P ) intoitself which preserve the probability, so that σ P = P . Recall that (Ω , F , P ) was assumed tobe a probability space which does not have atoms, and such that L (Ω , F , P ) is separable. Lemma 2.
Let C be a norm closed subset of L (Ω , F , P ) . Then the following are equivalent: (1) C is law invariant, that is X ∈ C and X ∼ Y implies that Y ∈ C (2) C is transformation invariant, that is for any X ∈ C and any σ ∈ A , we have X ◦ σ ∈ C As an immediate consequence, we have the following result:
Proposition 1.
A convex continuous functional (cid:37) : L d → R is a strongly coherent riskmeasure if and only if we have: (cid:37) ( X ) + (cid:37) ( Y ) = sup { (cid:37) ( X ◦ σ + Y ◦ τ ) : σ, τ ∈ A} . (1.1) OMONOTONIC MEASURES OF MULTIVARIATE RISKS 9
Proof.
Clearly X ◦ σ ∼ X and Y ◦ τ ∼ Y . Hence:sup { (cid:37) ( X ◦ σ + Y ◦ τ ) : σ, τ ∈ A} ≤ sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X, ˜ Y ∼ Y (cid:111) (1.2)To prove the converse, take any ε > X (cid:48) ∼ X and Y (cid:48) ∼ Y such that: (cid:37) ( X (cid:48) + Y (cid:48) ) ≥ sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X, ˜ Y ∼ Y (cid:111) − ε Consider the set { X ◦ σ : σ ∈ A} and denote by C its closure in L . It is obviously trans-formation invariant. By the preceding Lemma, it is also law invariant. Since X ∈ C and X (cid:48) ∼ X , we must have X (cid:48) ∈ C , meaning that there exists a sequence σ n ∈ A with (cid:107) X ◦ σ n − X (cid:48) (cid:107) −→
0. Similarly, there must exist a sequence τ n ∈ A with (cid:107) Y ◦ τ n − Y (cid:48) (cid:107) −→
0. Since (cid:37) is continuous, it follows that, for n large enough, we have:sup { (cid:37) ( X ◦ σ + Y ◦ τ ) : σ, τ ∈ A} ≥ (cid:37) ( X ◦ σ n + Y ◦ τ n ) ≥ (cid:37) ( X (cid:48) + Y (cid:48) ) − ε ≥ sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X, ˜ Y ∼ Y (cid:111) − ε and since this holds for any ε >
0, the converse of (1.2) holds (cid:3)
We can now state our main result:
Theorem 1.
Let (Ω , F , P ) be a probability space which does not have atoms, and suchthat L (Ω , F , P ) is separable. Let (cid:37) be a functional defined on L d . Then the followingpropositions are equivalent: (i): (cid:37) is a strongly coherent risk measure; (ii): (cid:37) is a maximal correlation risk measure Before we turn to the proof, note that this representation implies immediately thatstrongly coherent risk measures are in particular positive homogenous and translation in-variant, as announced above.
Proof.
We first show (i) ⇒ (ii). As the proof is quite long, we will punctuate it with severallemmas. † ALFRED GALICHON § MARC HENRY ‡ By the preceding proposition and law invariance, it is enough to prove that: (cid:37) ( X ) + (cid:37) ( Y ) = sup σ ∈A (cid:37) ( X + Y ◦ σ ) (1.3)Call (cid:37) ∗ the Legendre transform of (cid:37) in L d . Lemma 3. (cid:37) ∗ is law-invariant.Proof. For σ ∈ A , one has (cid:37) ∗ ( X ∗ ◦ σ ) = sup X ∈ L d {(cid:104) X ∗ ◦ σ, X (cid:105) − (cid:37) ( X ) } , so (cid:37) ∗ ( X ∗ ◦ σ ) =sup X ∈ L d (cid:8) (cid:104) X ∗ , X ◦ σ − (cid:105) − (cid:37) ( X ) (cid:9) = sup X ∈ L d (cid:8) (cid:104) X ∗ , X ◦ σ − (cid:105) − (cid:37) (cid:0) X ◦ σ − (cid:1)(cid:9) = (cid:37) ∗ ( X ∗ ) . (cid:3) Lemma 4.
If the functions f i , i ∈ I , are l.s.c. convex functions, then (cid:18) sup i f i (cid:19) ∗ = (cid:18) inf i f ∗ i (cid:19) ∗∗ Proof.
For a given l.s.c. convex function f , f ≤ (sup i f i ) ∗ is equivalent to f ∗ ≥ sup i f i ,hence to f ≥ f i for all i , hence to f ∗ ≤ f ∗ i for all i , hence f ≤ inf i f ∗ i , hence, as f is l.s.c.convex, to f ≤ (inf i f ∗ i ) ∗∗ , QED. (cid:3) Applying lemma 4 to the structure neutrality equation, one has (cid:37) ∗ ( X ∗ ) + (cid:37) ∗ ( Y ∗ ) = (cid:32) inf σ ∈A sup X,Y {(cid:104)
X, X ∗ (cid:105) + (cid:104) Y, Y ∗ (cid:105) − (cid:37) ( X + Y ◦ σ ) } (cid:33) ∗∗ = (cid:18) inf σ ∈A sup Y {(cid:104) Y, Y ∗ (cid:105) + (cid:37) ∗ ( X ∗ ) − (cid:104) Y ◦ σ, X ∗ (cid:105)} (cid:19) ∗∗ = (cid:18) (cid:37) ∗ ( X ∗ ) + inf σ ∈A sup Y (cid:104) Y, Y ∗ − X ∗ ◦ σ − (cid:105) (cid:19) ∗∗ . The term in sup Y ( ... ) on the right-hand side is 0 if Y ∗ = X ∗ ◦ σ − and + ∞ otherwise.Hence the previous formula becomes (cid:37) ∗ ( X ∗ ) + (cid:37) ∗ ( Y ∗ ) = ϕ ∗∗ ( X ∗ , Y ∗ ) (1.4)where we have defined ϕ ( X ∗ , Y ∗ ) = (cid:37) ∗ ( X ∗ ) if X ∗ ∼ Y ∗ + ∞ otherwise. (1.5) OMONOTONIC MEASURES OF MULTIVARIATE RISKS 11
Now suppose ϕ ( X ∗ , Y ∗ ) < ∞ , hence that (cid:37) ∗ ( X ∗ ) = (cid:37) ∗ ( Y ∗ ) < ∞ and X ∗ ∼ Y ∗ . As ϕ ≥ ϕ ∗∗ , it follows that (cid:37) ∗ ( X ∗ ) ≥ (cid:37) ∗ ( X ∗ ) + (cid:37) ∗ ( Y ∗ ) hence (cid:37) ∗ ( Y ∗ ) = (cid:37) ∗ ( X ∗ ) ≤
0, and ϕ ( X ∗ , Y ∗ ) ≤ ϕ ( X ∗ , Y ∗ ) < ∞ and ϕ ( X ∗ , Y ∗ ) − ϕ ∗∗ ( X ∗ , Y ∗ ) < ε . Replacing in (1.4), onefinds that 0 ≤ − (cid:37) ∗ ( X ∗ ) = − (cid:37) ∗ ( Y ∗ ) ≤ ε Lemma 5. ϕ ∗∗ is valued into { , + ∞} .Proof. As ϕ ∗ = ϕ ∗∗∗ , one has ϕ ∗ ( X, Y ) = sup ( X ∗ ,Y ∗ ) {(cid:104) X, X ∗ (cid:105) + (cid:104) Y, Y ∗ (cid:105) − ϕ ∗∗ ( X ∗ , Y ∗ ) } = sup ( X ∗ ,Y ∗ ) {(cid:104) X, X ∗ (cid:105) + (cid:104) Y, Y ∗ (cid:105) − ϕ ( X ∗ , Y ∗ ) } . Taking a maximizing sequence ( X ∗ n , Y ∗ n ) in the latter expression, one has necessarily ϕ ( X ∗ n , Y ∗ n ) − ϕ ∗∗ ( X ∗ n , Y ∗ n ) −→
0. From the previous remark, (cid:37) ∗ ( X ∗ n ) = (cid:37) ∗ ( Y ∗ n ) −→
0, hence ϕ ( X ∗ n , Y ∗ n ) −→
0. Therefore ϕ ∗ ( X, Y ) = sup ( X ∗ ,Y ∗ ): ϕ ( X ∗ ,Y ∗ )=0 {(cid:104) X, X ∗ (cid:105) + (cid:104) Y, Y ∗ (cid:105)} which is clearly positively homogeneous of degree 1. Its Legendre transform ϕ ∗∗ can there-fore only take values 0 and + ∞ , QED. (cid:3) Therefore, there is a closed convex set K such that ϕ ∗∗ is the indicator function of K ,that is ϕ ∗∗ ( X ∗ , Y ∗ ) = 0 if ( X ∗ , Y ∗ ) ∈ K + ∞ otherwise (1.6)and condition (1.4) implies that (cid:37) ∗ ( X ∗ ) + (cid:37) ∗ ( Y ∗ ) = 0 if ( X ∗ , Y ∗ ) ∈ K + ∞ otherwise (1.7)Note that if (cid:37) ∗ ( X ∗ ) < ∞ , then ϕ ( X ∗ , Y ∗ ) = (cid:37) ∗ ( X ∗ ) for all Y ∗ ∼ X ∗ , and then ϕ ∗∗ ( X ∗ , Y ∗ ) ≤ ϕ ( X ∗ , Y ∗ ) < ∞ . This implies that ϕ ∗∗ ( X ∗ , Y ∗ ) = 0, hence that (cid:37) ∗ ( X ∗ ) = † ALFRED GALICHON § MARC HENRY ‡
0. Therefore (cid:37) ∗ is also an indicator function: there exists a closed convex set C such that (cid:37) ∗ ( X ∗ ) = 0 if X ∗ ∈ C + ∞ otherwise (1.8)By comparison of (1.7) and (1.8), one finds that K = C × C By duality, (1.8) becomes (cid:37) ( X ) = sup X ∗ ∈ C (cid:104) X ∗ , X (cid:105) (1.9) C = { X ∗ | (cid:37) ∗ ( X ∗ ) = 0 } Condition (1.5) then implies that ϕ is an indicator function: there exists a set K (ingeneral, neither a closed nor a convex set) such that ϕ ( X ∗ , Y ∗ ) = 0 if ( X ∗ , Y ∗ ) ∈ K + ∞ otherwiseBy comparison with formulas (1.5) and (1.6), one finds that( X ∗ , Y ∗ ) ∈ K ⇐⇒ X ∗ ∈ C, Y ∗ ∈ C and X ∗ ∼ Y ∗ K = co K Lemma 6.
Denote by E ( C ) the set of strongly exposed points of C , and K the closure of K for the norm topology in L × L . Then E ( C ) × E ( C ) ⊂ K Proof.
Recall (cf. [9]) that X ∗ is strongly exposed in C if there is a continuous linear form X such that any maximizing sequence for X in C converges strongly to X ∗ : X ∗ n ∈ C (cid:104) X, X ∗ n (cid:105) −→ sup C (cid:104) X, X ∗ (cid:105) = ⇒ (cid:107) X ∗ n − X ∗ (cid:107) −→ OMONOTONIC MEASURES OF MULTIVARIATE RISKS 13
For ε >
0, denote by T C ( X, ε ) the set of Y ∗ ∈ C such that sup Z ∗ ∈ C (cid:104) X, Z ∗ (cid:105)−(cid:104) X, Y ∗ (cid:105) ≤ ε .Then X ∗ ∈ C is strongly exposed by X if and only if sup Z ∗ ∈ C (cid:104) X, Z ∗ (cid:105) = (cid:104) X, X ∗ (cid:105) and δ [ T C ( X, ε )] tends to 0 when ε −→
0, where δ denotes the diameter, δ [ T C ( X, ε )] := sup {(cid:107) X ∗ − X ∗ (cid:107) | X ∗ ∈ T C ( X, ε ) , X ∗ ∈ T C ( X, ε ) } . Going back to the problem, it is clear that if X ∗ and Y ∗ are strongly exposed in C , then( X ∗ , Y ∗ ) is strongly exposed in C × C : E ( C ) × E ( C ) ⊂ E ( C × C ) = E ( K )We claim that every strongly exposed point of K necessarily belongs to K (the closure isstill the norm closure). Indeed, suppose there exists ( X ∗ , X ∗ ) ∈ E ( K ) such that ( X ∗ , X ∗ ) / ∈ K . Then there exists ε > K ∩ B (( X ∗ , X ∗ ) , ε ) = ∅ , where B (( X ∗ , X ∗ ) , ε )is the ball of center ( X ∗ , X ∗ ) and radius ε >
0. As ( X ∗ , X ∗ ) is strongly exposed, thereexists a linear form ( X , X ) strongly exposing it, and one can choose η > δ [ T K (( X , X ) , η )] < ε . Since T K (( X , X ) , η ) contains ( X ∗ , X ∗ ), one concludesthat K ∩ T K (( X , X ) , η ) = ∅ , thus K ⊂ { ( Y ∗ , Y ∗ ) ∈ K | (cid:104) X , Y ∗ (cid:105) + (cid:104) X , Y ∗ (cid:105) ≥ (cid:104) X , X ∗ (cid:105) + (cid:104) X , X ∗ (cid:105) + η } But the right-hand side is a closed convex set, so by taking the closed convex hull of theleft-hand side, one gets co ( K ) ⊂ { ( Y ∗ , Y ∗ ) ∈ K | (cid:104) X , Y ∗ (cid:105) + (cid:104) X , Y ∗ (cid:105) ≥ (cid:104) X , X ∗ (cid:105) + (cid:104) X , X ∗ (cid:105) + η } and taking ( Y ∗ , Y ∗ ) = ( X ∗ , X ∗ ) ∈ K leads to a contradiction.Therefore E ( K ) ⊂ K , and one has E ( C ) × E ( C ) ⊂ E ( K ) ⊂ K , QED. (cid:3) By a celebrated theorem of Bishop and Phelps (see again [9]), there is a dense subset H of L (in fact, a dense G δ ) such that, for every X ∈ H , the maximum of (cid:104) X ∗ , X (cid:105) for X ∗ ∈ C is attained at a strongly exposed point. Going back to (1.9), take some X ∈ H ,and let X ∗ ∈ C be such that (cid:37) ( X ) = (cid:104) X ∗ , X (cid:105) † ALFRED GALICHON § MARC HENRY ‡ with X ∗ ∈ E ( C ). Now take another Y ∈ H , and another point Y ∗ ∈ E ( C ) such that (cid:37) ( Y ) = (cid:104) Y ∗ , Y (cid:105) . One has ( X ∗ , Y ∗ ) ∈ E ( C ) × E ( C ), and it results from the previouslemma that ( X ∗ , Y ∗ ) ∈ K . This implies the existence of a sequence ( X ∗ n , Y ∗ n ) ∈ K suchthat ( X ∗ n , Y ∗ n ) converges to ( X ∗ , Y ∗ ) in norm. By the definition of K , one should have X ∗ n ∼ Y ∗ n , that is Y ∗ n = X ∗ n ◦ σ n for σ n ∈ A . Hence, (cid:37) ( Y ) = (cid:104) Y ∗ , Y (cid:105) = lim n (cid:104) Y ∗ n , Y (cid:105) =lim n (cid:104) X ∗ n ◦ σ n , Y (cid:105) = lim n (cid:104) X ∗ n , Y ◦ σ − n (cid:105) . But by the Cauchy-Schwartz inequality, (cid:12)(cid:12) (cid:104) X ∗ n , Y ◦ σ − n (cid:105) − (cid:104) X ∗ , Y ◦ σ − n (cid:105) (cid:12)(cid:12) ≤ (cid:13)(cid:13) Y ◦ σ − n (cid:13)(cid:13) (cid:107) X ∗ n − X ∗ (cid:107) , which tends to 0 as (cid:13)(cid:13) Y ◦ σ − n (cid:13)(cid:13) = (cid:107) Y (cid:107) . Therefore, (cid:37) ( Y ) = lim n (cid:104) X ∗ n , Y ◦ σ − n (cid:105) =lim n (cid:104) X ∗ , Y ◦ σ − n (cid:105) ≤ sup ˜ Y ∼ Y (cid:104) X ∗ , ˜ Y (cid:105) . But one has also (cid:37) ( Y ) = sup Y ∗ ∈ C (cid:104) Y ∗ , Y (cid:105) ≥ sup σ ∈A (cid:104) X ∗ ◦ σ, Y (cid:105) = sup σ ∈A (cid:104) X ∗ , Y ◦ σ (cid:105) ≥ sup ˜ Y ∼ Y (cid:104) X ∗ , ˜ Y (cid:105) , therefore (cid:37) ( Y ) = sup ˜ Y ∼ Y (cid:104) X ∗ , ˜ Y (cid:105) ∀ Y ∈ ΩThe functions ρ ( Y ) and: sup ˜ Y ∼ Y (cid:104) X ∗ , ˜ Y (cid:105) = sup ˜ X ∼ Y (cid:104) ˜ X ∗ , Y (cid:105) are both convex, finite and lsc on L , and hence continuous. Since they coincide on a densesubset, they coincide everywhere. This proves the direct implication (i) ⇒ (ii) of the theorem.We now turn to the converse. Let (cid:37) µ be a maximal correlation risk measure with respectto baseline measure µ . Then (cid:37) µ is clearly convex. Take X and Y in L d . By proposition 7in the Appendix, there exist two convex functions φ and φ such that for U ∼ µ , one has ∇ φ ( U ) ∼ X and ∇ φ ( U ) ∼ Y , and (cid:37) µ ( X ) = E [ U · ∇ φ ( U )], (cid:37) µ ( Y ) = E [ U · ∇ φ ( U )]. Thus (cid:37) µ ( X )+ (cid:37) µ ( Y ) = E [ U · ( ∇ φ ( U )+ ∇ φ ( U ))], but for all ˜ U ∼ U , E [ U · ( ∇ φ ( U )+ ∇ φ ( U ))] ≥ E [ ˜ U · ( ∇ φ ( U )+ ∇ φ ( U ))], hence (cid:37) µ ( X )+ (cid:37) µ ( Y ) = sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X, ˜ Y ∼ Y (cid:111) . Thus (cid:37) µ is strongly coherent, which completes the proof of Theorem 1. (cid:3) A multivariate generalization of Kusuoka’s theorem
In this section we recall the existing axiomatization leading to the representation result ofKusuoka in [15], where risk measures for univariate risks that are subadditive, law invariantand comonotonic additive are represented by maximal correlation functionals. We then pro-pose a way to generalize these axioms to the case where risk measures deal with multivariate
OMONOTONIC MEASURES OF MULTIVARIATE RISKS 15 risks, by showing how to generalize the only problematic axiom, namely comonotonic ad-ditivity. We then give a representation result which extends Kusuoka’s to the multivariatecase.2.1.
Coherent and regular risk measures.
To describe the existing axiomatic frame-work, we first recall the following definitions valid in the univariate case, from [1], andexisting results.
Definition 4 (Coherent; Convex risk measures) . A functional (cid:37) : L → R is called a coherent risk measure if it satisfies the following four properties (MON), (TI), (CO) and(PH) as follows: • Monotonicity (MON): X ≤ Y ⇒ (cid:37) ( X ) ≤ (cid:37) ( Y ) • Translation invariance (TI): (cid:37) ( X + m ) = (cid:37) ( X ) + m(cid:37) (1) • Convexity (CO): (cid:37) ( λX + (1 − λ ) Y ) ≤ λ(cid:37) ( X ) + (1 − λ ) (cid:37) ( Y ) for all λ ∈ [0 , . • Positive homogeneity (PH): (cid:37) ( λX ) = λ(cid:37) ( X ) for all λ ≥ .A functional which only satisfies (MON), (TI) and (CO) is called a convex risk measure . Even though these definitions are mostly standard, note that since we have considered riskmeasures associated with random vectors of potential losses , the definition of monotonicitytakes an non decreasing form, unlike the definition in most of the literature on coherentrisk measures. Compared to the traditional presentation in the literature, the expression oftranslation invariance is adapted to take into account the fact that we did not impose thescaling convention (cid:37) (1) = 1. Also note (as we have a multivariate generalization in mind)that, let alone monotonicity (which we shall discuss separately below), all these axiomsadmit a straightforward generalization to the case of risks X ∈ L d . The expression for(CO) and (PH) will remain unchanged; for (TI) the natural extension to dimension d willbe given in (2.2) below.A representation of coherent risk measures was given in the original work of [1], whereasrepresentation of convex risk measures was proposed in [10]. These were extended to themultivariate setting by Jouini, Meddeb and Touzi in [13] who characterize coherent accep-tance sets, i.e. sets in R n that cancel the risk associated with an R d valued random vector, † ALFRED GALICHON § MARC HENRY ‡ and consider aggregation issues, and Burgert and R¨uschendorf in [4] who characterize con-vex real valued measures for multivariate risks, and R¨uschendorf in [17], who characterizesthose of the latter that are law invariant, and proposes maximal correlation risk measuresas an example. The idea of introducing a variational characterization of comonotonic ad-ditivity as well as the generalization of Kusuoka’s axiomatic approach it allows constitutethe essential novelties of this section. Regularity.
In the case of univariate risks, comonotonic additivity is used in additionto law invariance to define regular risk measures (see [10], sect. 4.7):
Definition 5 (Comonotonicity; Regularity) . Two random variables X and Y are comono-tonic (or synonymously, comonotone ) if there exits a random variable U and two increasingfunctions φ and ψ such that X = φ ( U ) and Y = ψ ( U ) hold almost surely.A functional (cid:37) : L → R is called a regular risk measure if it satisfies: • Law invariance (LI), and • Comonotonic additivity (CA): (cid:37) ( X + Y ) = (cid:37) ( X )+ (cid:37) ( Y ) when X, Y are comonotonic.
Note that comonotonic additivity implies translation invariance, as any random variableis comonotonic with the constant. Informally speaking, law invariance suggests that therisk measure is a functional of the quantile function F − X ( t ) = inf { x : F X ( x ) ≥ t } associatedwith the distribution. Positive homogeneity and comonotonic additivity together suggestthat this representation is linear (cid:37) ( X ) := (cid:82) φ ( t ) F − X ( t ) dt . Finally, subadditivity suggeststhat the weights φ ( t ) are increasing with respect to t . Precisely Kusuoka has shown thefollowing in [15], Theorem 7: Proposition 2 (Kusuoka) . A coherent risk measure (cid:37) is regular if and only if for someincreasing and nonnegative function φ on [0 , , we have (cid:37) ( X ) := (cid:90) φ ( t ) F − X ( t ) dt, where F X denotes the cumulative distribution functions of the random variable X , and itsgeneralized inverse F − X ( t ) = inf { x : F X ( x ) ≥ t } is the associated quantile function. OMONOTONIC MEASURES OF MULTIVARIATE RISKS 17
Variational characterization.
By the Hardy-Littlewood-P´olya inequality shown inlemma 11 of [15], we can write a variational expression for coherent regular risk measures: (cid:90) φ ( t ) F − X ( t ) dt = max (cid:110) E [ X ˜ U ] : ˜ U ∼ µ (cid:111) . (2.1)where µ if the probability distribution of φ , and the maximum is taken over the equidistri-bution class of µ . The reader is referred to [7] and [6] for a nice treatment of this variationalproblem and the dual representation of Schur convex functions in the univariate case. As weshall see, variational characterization 2.1 will be key when generalizing to the multivariatesetting.2.2. A multivariate notion of comonotonicity.
We now turn to an extension of theconcept of comonotonicity. Note first that a valid definition of comonotonicity in dimensionone is the following: two random variables X and Y are comonotonic if and only if onecan construct almost surely Y = T Y ( U ) and X = T X ( U ) for some third random variable U , and T X , T Y non decreasing functions. In other words, X and Y are comonotonicwhenever there is a random variable U such that E [ U X ] = max (cid:110) E [ X ˜ U ] : ˜ U ∼ U (cid:111) and E [ U Y ] = max (cid:110) E [ Y ˜ U ] : ˜ U ∼ U (cid:111) . This variational characterization will be the basis for ourgeneralized notion of comonotonicity.To simplify our exposition in the remainder of the paper, we shall make the followingassumption: Assumption.
In the remainder of the paper, we shall assume that the baseline distribu-tion of risk µ is absolutely continuous with respect to Lebesgue measure. Definition 6 ( µ -comonotonicity) . Let µ be a probability measure on R d that is absolutelycontinuous. Two random vectors X and Y in L d are called µ -comonotonic if for somerandom vector U ∼ µ , we have U ∈ argmax ˜ U (cid:110) E [ X · ˜ U ] , ˜ U ∼ µ (cid:111) , and U ∈ argmax ˜ U (cid:110) E [ Y · ˜ U ] , ˜ U ∼ µ (cid:111) . † ALFRED GALICHON § MARC HENRY ‡ In particular, every random vector X is µ -comonotonic with constant vectors Y = y .Note that the geometric interpretation of this definition is that X and Y are µ -comonotonicif and only if they have the same L projection on the equidistribution class of µ . We nextgive a few useful lemmas. We start with a result securing the existence of a µ -comonotonicpair with given marginals. Lemma 7.
Let µ be a probability measure on R d that is absolutely continuous. Then giventwo probability distributions P and Q in P ( R d ) , there exists a pair ( X, Y ) in ( L d ) suchthat X ∼ P , Y ∼ Q , and X and Y are µ -comonotonic.Proof. By Brenier’s theorem (Proposition 7 in the Appendix), there exists U ∼ µ and twoconvex functions φ and φ such that X = ∇ φ ( U ) ∼ P and Y = ∇ φ ( U ) ∼ Q . Then X and Y are µ -comonotonic. (cid:3) We then provide a useful characterization of µ -comonotonicity. Lemma 8.
Let µ be probability measure on R d that is absolutely continuous. Then tworandom vectors X and Y in L d are µ -comonotonic if (cid:37) µ ( X + Y ) = (cid:37) µ ( X ) + (cid:37) µ ( Y ) where (cid:37) µ ( X ) := sup (cid:110) E [ X · ˜ U ] : ˜ U ∼ µ (cid:111) is the maximal correlation risk measure, definedin Definition 3 above.Proof. There exists U ∼ µ such that (cid:37) µ ( X + Y ) = E [( X + Y ) · U ]. We have E [( X + Y ) · U ] = E [ X · U ] + E [ Y · U ], and both inequalities E [ X · U ] ≤ (cid:37) µ ( X ) and E [ Y · U ] ≤ (cid:37) µ ( Y ) hold,thus E [ X · U ] + E [ Y · U ] ≤ (cid:37) µ ( X ) + (cid:37) µ ( Y ) with equality if and only both inequalities aboveare actually equalities, which is the equivalence needed. (cid:3) This lemma implies in particular that maximal correlation functionals with baseline mea-sure µ are µ -comonotone additive. Thus combining with Theorem 1, this establishes thatstrongly coherent risk measures are µ -comonotone additive for some µ .We next show that in dimension 1 , the notion of µ -comonotonicity is equivalent to theclassical notion of comonotonicity, regardless of the choice of µ (provided it is absolutelycontinuous). OMONOTONIC MEASURES OF MULTIVARIATE RISKS 19
Lemma 9.
In dimension d = 1 , let µ be probability measure on R d that is absolutelycontinuous. Then X and Y are µ -comonotonic if and only if they are comonotonic inthe classical sense, that is, if and only if there exists a random variable Z and two nondecreasing functions f and g such that X = f ( Z ) and Y = g ( Z ) holds almost surely.Proof. Suppose that X and Y are µ -comonotonic. Then there is a U ∼ µ such that U ∈ argmax ˜ U (cid:110) E [ X ˜ U ] , ˜ U ∼ µ (cid:111) and U ∈ argmax ˜ U (cid:110) E [ Y ˜ U ] , ˜ U ∼ µ (cid:111) . This implies in particularthe existence of two increasing functions f and g such that X = f ( U ) and Y = g ( U ) holdsalmost surely. Hence X and Y are comonotonic in the classical sense. Conversely, supposethat X and Y are comonotonic in the classical sense. There exists a random variable Z and two increasing functions f and g such that X = f ( Z ) and Y = g ( Z ) holds almostsurely. Let F Z be the cumulative distribution function of Z , and F µ the one associatedwith µ . Defining U = F − µ ( F Z ( Z )), one has U ∼ µ , and denoting ϕ = f ◦ F − µ ◦ F Z and φ = g ◦ F − µ ◦ F Z , one has X = ϕ ( U ) and Y = φ ( U ). Thus X and Y are µ -comonotonic. (cid:3) In dimension one, one recovers the classical notion of comonotonicity regardless of thechoice of µ as shown in the previous lemma. However, in dimension greater than one, thecomonotonicity relation crucially depends on the baseline distribution µ , unlike in dimensionone. The following lemma makes this precise. Lemma 10.
Let µ and ν be probability measures on R d that is absolutely continuous. Then:- In dimension d = 1 , µ -comonotonicity always implies ν -comonotonicity.- In dimension d ≥ , µ -comonotonicity implies ν -comonotonicity if and only if ν = T µ for some location-scale transform T ( u ) = λu + u where λ > and u ∈ R d . In other words,comonotonicity is an invariant of the location-scale family transformation classes.Proof. In dimension one, all the notions of µ -comonotonicity coincide with the classicalnotion of comonotonicity, as remarked above. Let d ≥
2, and suppose that µ -comonotonicityimplies ν -comonotonicity. Consider U ∼ µ , and let φ be the convex function (defined up toan additive constant) such that ∇ φ ν = µ . Then there exists a random vector V ∼ ν suchthat U = ∇ φ ( V ) almost surely. Consider some arbitrary symmetric positive endomorphismΣ acting on R d . Then the map u → Σ( u ) is the gradient of a convex function (namely the † ALFRED GALICHON § MARC HENRY ‡ associated quadratic form u → (cid:104) u, Σ( u ) (cid:105) ), therefore the random vectors U and Σ( U ) are µ -comonotonic. By hypothesis, it follows that U and Σ( U ) are also ν -comonotonic, hencethere exists a convex function ζ such that Σ( U ) = ∇ ζ ( V ) holds almost surely. Therefore,the equality Σ ◦ ∇ φ ( v ) = ∇ ζ ( v ) holds for almost every v . By differentiating twice (whichcan be done almost everywhere, by Aleksandrov’s theorem), we get that Σ ◦ D φ ( v ) = D ζ ( v ) hence Σ ◦ D φ is almost everywhere a symmetric endomorphism. This being trueregardless of the choice of Σ, it follows that the matrix of D φ in any orthonormal basisof R d is almost everywhere a diagonal matrix, hence there exists a real valued map λ ( u )such that D φ ( u ) = λ ( u ) u , with λ ( u ) >
0. But this implies ∂ u i ∂ u j φ ( u ) = 0 for i (cid:54) = j and ∂ u i φ ( u ) = λ ( u ) for all i . Therefore, ∂ u j λ ( u ) = ∂ u j ∂ u i φ ( u ) = 0. Hence λ ( u ) = λ a strictlypositive constant. It follows that ∇ φ ( u ) = λu + u , QED. The converse holds trivially. (cid:3) Remark 2.
A close inspection of the proof of this lemma reveals that the essential reason ofthe discrepancy between dimension one and higher is the simple fact that the general linearmatrix group G l d ( R ) is Abelian if and only if d = 1 . We can now define a concept which generalizes comonotonic additivity to the multidi-mensional setting.
Definition 7 ( µ -comonotonic additivity; µ -regularity) . A functional (cid:37) : L d → R is calleda µ -regular risk measure if it satisfies: • Law invariance (LI), and • µ -comonotonic additivity ( µ -CA): (cid:37) ( X + Y ) = (cid:37) ( X ) + (cid:37) ( Y ) when X, Y are µ -comonotonic. As every random vector is comonotonic with constants, it implies that a µ -comonotonicadditive functional (cid:37) is in particular translation invariant in the following multivariate sense (cid:37) ( X + my ) = (cid:37) ( X ) + m(cid:37) ( y ) for all m ∈ R and y ∈ R d . (2.2)2.3. A multivariate extension of Kusuoka’s theorem.
We now show that maximalcorrelation is equivalent to the combination of subadditivity, law invariance, µ -comonotonic OMONOTONIC MEASURES OF MULTIVARIATE RISKS 21 additivity and positive homogeneity. Further, the probability measure µ involved in thedefinition of comonotonic additivity shall be precisely related to the one which is taken asa baseline scenario of the maximal correlation measure.We have seen above (lemma 8) that maximal correlation risk measures defined withrespect to a distribution µ are µ -comonotonic additive. When the measure is also lawinvariant and coherent, we shall see that the converse holds true, and this constitutes oursecond main result, which is a multivariate extension of Kusuoka’s theorem. Note that whileKusuoka’s theorem was stated using the axioms of subadditivity and positive homogeneityin addition to others, we only need the weaker axiom of convexity in addition to the sameothers. Theorem 2.
Let (cid:37) be a l.s.c. risk measure on L d with the properties of convexity (CO),and µ -regularity, that is law invariance (LI) and µ -comonotonic additivity ( µ -CA). Then (cid:37) is strongly coherent. Equivalently, (cid:37) is a maximal correlation risk measure, namely thereexists ν ∈ P ( R d ) such that (cid:37) = (cid:37) ν , where (cid:37) ν is a maximal correlation measure with respectto baseline scenario ν , and µ and ν are related by location-scale transformation, that is ν = T µ where T ( u ) = λu + u with λ > and u ∈ R d .Proof. Combining the convexity and law invariance axioms imply (cid:37) ( ˜ X + ˜ Y ) ≤ (cid:37) ( X ) + (cid:37) ( Y )for all X, Y, ˜ X, ˜ Y in L d , thus (cid:37) ( X ) + (cid:37) ( Y ) ≥ sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X ; ˜ Y ∼ Y (cid:111) . But byLemma 7, there exists a µ -comonotonic pair ( X, Y ). By µ -comonotonic additivity, one has (cid:37) ( X ) + (cid:37) ( Y ) = (cid:37) ( X + Y ), therefore the previous inequality is actually an equality, and (cid:37) ( X ) + (cid:37) ( Y ) = sup (cid:110) (cid:37) ( ˜ X + ˜ Y ) : ˜ X ∼ X ; ˜ Y ∼ Y (cid:111) therefore (cid:37) is strongly coherent. By Theorem 1, it results that there exists ν ∈ P ( R d ) suchthat (cid:37) = (cid:37) ν . But by the comonotonic additivity of (cid:37) and lemma 8, any two vectors X and Y which are µ -comonotonic are also ν -comonotonic. By lemma 10, this implies that thereis a location-scale map T such that ν = T µ , so that the result follows. (cid:3) Because it allows a natural generalization of well-known univariate results, this theoremmakes a strong point in arguing that our notion of comonotonic additivity is the right onewhen considering multivariate risks. † ALFRED GALICHON § MARC HENRY ‡ Extending monotonicity.
We extend the concept of monotonicity with reference toa partial order (cid:22) defined on R d in the following way: Definition 8 ( (cid:22) -monotonicity) . A functional (cid:37) : L → R is said to be (cid:22) -monotone if itsatisfies:( (cid:22) -MON): X (cid:22) Y almost surely ⇒ (cid:37) ( X ) ≤ (cid:37) ( Y ) . We have the following result:
Proposition 3.
Let (cid:37) µ be the maximal correlation risk measure with respect to baselinedistribution µ . Let ( Supp µ ) be the polar cone of the support of µ . For a cone C ⊂ R d ,denote (cid:22) C the partial order in R d induced by C , namely x (cid:22) C y if and only if y − x ∈ C .Then (cid:37) µ is monotone with respect to (cid:22) C if and only if C ⊂ − ( Supp µ ) .Proof. If X and U are µ -comonotonic, then D(cid:37) X ( Z ) = E [ U · Z ], but the property that E [ U · Z ] ≥ Z almost surely included in C is equivalent to C ⊂ − ( Supp µ ) . (cid:3) Note that in dimension d = 1, with C = R + , one recovers the usual notion of mono-tonicity. In higher dimension, we get in particular that if µ is supported in R d + , then (cid:37) µ is monotone with respect to the strong order of R d . Finally, note also that the conceptof monotonicity proposed here is a somewhat weak one, as it deals only with almost suredomination between X and Y . A stronger concept of monotonicity would involve stochasticordering of X and Y ; we do not pursue this approach here.3. Numerical computation
In this section, we show explicit examples of computation of the maximal correlationrisk measure. We start by the Gaussian case, where closed-form formulas are available. Tohandle more general cases we shall show that the problem may be thought of as an auc-tion mechanism, an intuition we shall develop and use to derive an efficient computationalalgorithm.
OMONOTONIC MEASURES OF MULTIVARIATE RISKS 23
Gaussian risks.
We now consider the case where the baseline risk U is Gaussian withdistribution µ = N (0 , Σ U ), with Σ U a positive definite matrix of size d , and we study therestriction of (cid:37) µ to the class of Gaussian risks.Note (cf. [16] I, Ex. 3.2.12) that the linear map u → A X u where A X = Σ − / U (Σ / U Σ X Σ / U ) / Σ − / U , sends the probability measure N (0 , Σ U ) to the probability measure N (0 , Σ X ); further, A X is positive semidefinite, so this map is the gradient of convex function u → u (cid:48) A X u . Hencewe have the following straightforward matrix formulation of comonotonicity. Lemma 11.
Consider two Gaussian vectors X ∼ N (0 , Σ X ) and Y ∼ N (0 , Σ Y ) with Σ X and Σ Y invertible. Then X and Y are µ -comonotonic if and only if E [ XY T ] = Σ − / U (Σ / U Σ X Σ / U ) / (Σ / U Σ Y Σ / U ) / Σ − / U . (3.1) In particular, in the case µ = N (0 , I d ) , X and Y are µ -comonotonic if and only if E [ XY T ] =Σ / X Σ / Y .Proof. If X and Y are µ -comonotonic, then there exists U ∼ N (0 , Σ U ) such that X = A X U and Y = A Y U , and the result follows. Conversely, if equality (3.1) holds, then denoting U = A − X X and V = A − Y Y , we get that 1) U ∼ N (0 , Σ U ) and V ∼ N (0 , Σ U ), and 2) E [ U V T ] = A − X E [ XY T ] A − Y = Σ U , therefore by the Cauchy-Schwartz inequality, U = V almost surely. Thus X and Y are µ -comonotonic. (cid:3) We now derive the value of correlation risk measures at Gaussian risks. Still by [16] I,Ex. 3.2.12, we have immediately:
Proposition 4.
When the baseline risk U is Gaussian with distribution µ = N (0 , Σ U ) , wehave for a Gaussian vector X ∼ N (0 , Σ X ) : (cid:37) µ ( X ) = tr (cid:20)(cid:16) Σ / U Σ X Σ / U (cid:17) / (cid:21) . In particular, in the case µ = N (0 , I d ) , (cid:37) µ is the trace norm: (cid:37) µ ( X ) = tr (cid:104) Σ / X (cid:105) . † ALFRED GALICHON § MARC HENRY ‡ Proof.
One has (cid:37) µ ( X ) = max { E [ ˜ X · U ]; ˜ X ∼ X } = E (cid:2) A X U U T (cid:3) , thus because of theprevious results, (cid:37) µ ( X ) = E (cid:104) U T Σ − / U (Σ / U Σ X Σ / U ) / Σ − / U U (cid:105) = tr (cid:16) (Σ / U Σ X Σ / U ) / (cid:17) . (cid:3) In dimension 2, we have the formula tr (cid:16) √ S (cid:17) = (cid:113) tr ( S ) + 2 √ det S , so we get a closedform expression: Example 3.
When d = 2 , and µ = N (0 , I ) , we have for Σ X = σ (cid:37)σ σ (cid:37)σ σ σ thefollowing expression (cid:37) µ ( X ) = (cid:113) σ + σ + 2 σ σ (cid:112) − (cid:37) . Kantorovich duality and Walras auction.
We now see how optimal transportationduality permits the computation of maximal correlation risk measures. More precisely,we shall see that the algorithm we shall propose to compute numerically the maximalcorrelation risk measures is to be thought of intuitively as a
Walrasian auction , as weshall explain. We refer to [16] and [22] for overviews of the theory and applications ofoptimal transportation, including recent results. Consider a baseline distribution µ , andrecall the expression for the maximal correlation risk measure (cid:37) µ ( X ) of a random vector X ∈ R d : (cid:37) ( X ) = sup (cid:110) E [ X · ˜ U ] : ˜ U ∼ µ (cid:111) . This problem is the problem of computing themaximal transportation cost of mass distribution µ to mass distribution L X with cost oftransportation c ( u, x ) = u · x .The problem has a dual expression according to Monge-Kantorovich duality (or dualityof optimal transportation). We have (theorem 2.9 page 60 of [22]): (cid:37) µ ( X ) = min V ∈ c . l . s . c . ( R d ) (cid:18)(cid:90) V dµ + (cid:90) V ∗ d L X (cid:19) . (3.2)The function V that achieves the minimum in (3.2) exists by theorem 1(iii) and when L X is absolutely continuous, one has ∇ V ∗ ( X ) ∼ µ and (cid:37) µ ( X ) = E [ X · ∇ V ∗ ( X )]. In the sequelwe shall make the law invariance of (cid:37) µ and the symmetry between the roles played by thedistributions of X and U explicit in the notation by writing (cid:37) µ ( L X ) := (cid:37) ( µ, L X ) := (cid:37) µ ( X ) . OMONOTONIC MEASURES OF MULTIVARIATE RISKS 25
Law-invariant, convex risk measures.
Following [17], theorem 2.3, the maximum cor-relation risk measures are the building blocks of more general convex risk measures. Onehas the following result, which was proven by R¨uschendorf in the cited paper.
Proposition 5.
Let (cid:37) be a convex measure. Then (cid:37) is law invariant if and only if thereexists a penalty function α such that (cid:37) ( X ) = sup µ ∈P ( R d ) (cid:37) µ ( X ) − α ( µ ) . Furthermore, α ( µ ) can be chosen as α ( µ ) = sup { (cid:37) µ ( X ) : X ∈ L d , (cid:37) ( X ) ≤ } . Dual representations of the risk measure.
The following lemma provides an expressionof the conjugate of the maximal correlation risk measure.
Lemma 12.
For W : R d → R convex and lower semicontinuous, one has sup P ∈P ( R d ) (cid:26) (cid:37) µ ( P ) + (cid:90) W dP (cid:27) = (cid:90) ( − W ) ∗ dµ. Proof.
One has (cid:82) ( − W ) ∗ dµ = (cid:82) sup y { u · y + W ( y ) } dµ ( u ), thus (cid:82) ( − W ) ∗ dµ = sup τ ( · ) (cid:82) u · τ ( u )+ W ( τ ( u )) dµ ( u ) where the supremum is over all measurable maps τ : R → R . Groupingby equidistribution class, one has (cid:90) ( − W ) ∗ dµ = sup P (cid:34) sup τ µ = P (cid:90) u · τ ( u ) dµ ( u ) + (cid:90) W dP (cid:35) = sup P (cid:26) (cid:37) µ ( P ) + (cid:90) W dP (cid:27) . (cid:3) General equilibrium interpretation.
We now consider then (cid:37) ( µ, L X ) for two probabil-ity distributions on R d , and we interpret µ as a distribution of consumers (e.g. insurees)and L X as a distribution of goods (e.g. insurance contracts) in an economy. Consumerwith characteristics u derives utility from the consumption of good with attributes x equalto the interaction u · x of consumer characteristics and good attributes. Consumer u max-imizes utility u · x of consuming good x minus the price V ∗ ( x ) of the good. Hence his † ALFRED GALICHON § MARC HENRY ‡ indirect utility is sup x ∈ R d [ u · x − V ∗ ( x )] = V ∗∗ ( u ) = V ( u ). According to equation (3.2),the total surplus in the economy E [ X · U ] is maximized for the pair ( V, V ∗ ) of convexlower semi-continuous functions on R d that minimizesΦ( V ) := (cid:90) V dµ + (cid:90) V ∗ d L X . The functional Φ is convex and its Fr´echet derivative, when it exists, is interpreted as the excess supply in the economy, with value at h equal to D Φ( h ) = (cid:82) h d ( µ − ν V ), where ν V := ∇ V ∗ L X . Indeed, the convexity of the map V → Φ( V ) follows from the identityestablished above in lemma 12Φ( V ) = sup ν ∈P ( R ) d (cid:26) (cid:37) ( L X , ν ) + (cid:90) V d ( µ − ν ) (cid:27) , thus this map is the supremum of functionals that are linear in V . The supremum is attainedfor ν = ν V , hence it follows that D Φ V ( h ) = (cid:82) h d ( µ − ν V ).Hence, excess supply is zero when the indirect utility V and the prices V ∗ are suchthat ν V = µ . With our economic interpretation above, this can be seen as a Walrasianwelfare theorem , where the total surplus is maximized by the set of prices that equatesexcess supply to zero.This general equilibrium interpretation of maximal correlation risk measures extends tothe method of computation of the latter through a gradient algorithm to minimize theconvex functional Φ. This algorithm can be interpreted as a
Walrasian tˆatonnement that adjusts prices to reduce excess supply D Φ V . This algorithm is described in moredetail and implemented fully in the case of discretely distributed risks below.3.3. Discrete risks.
We now consider the restriction (cid:37) µ to the class of risks whose distri-bution is discrete. We have in mind in particular the empirical distribution of a sample ofrecorded data of the realization of the risk. The procedure we shall now describe consistsin the computation of the generalized quantile of the discrete distribution, which opens theway for econometric analysis of maximal correlation risk measures.3.3.1. Representation.
Let X ∼ P n , where P n = (cid:80) nk =1 π k δ Y k is a discrete distributionsupported by { Y , ..., Y n } , n distinct points in R d . For instance if P n is the empirical OMONOTONIC MEASURES OF MULTIVARIATE RISKS 27 measure of the sample { Y , ..., Y n } , then π k = 1 /n . We are looking for ϕ : [0 , d → R d suchthat:(i) for (almost) all u ∈ [0 , d , ϕ ( u ) ∈ { Y , ..., Y n } (ii) for all k ∈ { , ..., n } , µ (cid:0) ϕ − { Y k } (cid:1) = π k ie. ϕ pushes forward µ to P n (iii) ϕ = ∇ V , where V : R d → R is a convex function.It follows from the Monge-Kantorovich duality that there exist weights ( w , ..., w n ) ∈ R n ,such that V ( u ) = w ∗ ( u ) := max k {(cid:104) u, Y k (cid:105) − w k } is the solution. Introduce the functionalΦ µ : R n → R , Φ µ ( w ) = (cid:82) w ∗ ( u ) dµ ( u ). The numerical implementation of the method isbased on the following result: Proposition 6.
There exist unique (up to an additive constant) weights w , ..., w n suchthat for w ∗ ( u ) = max k {(cid:104) u, Y k (cid:105) − w k } , the gradient map ϕ = ∇ w ∗ satisfies (i), (ii) and(iii) above. The function w → Φ µ ( w ) + (cid:80) nk =1 π k w k is convex, and reaches its minimum at w = ( w , ..., w n ) defined above.Proof. By the Knott-Smith optimality criterion (theorem 2.12(i) page 66 of [22]), thereexists a convex function w on the set { Y , . . . , Y n } such that the optimal pair in (3.2)is ( w, V ), where V is the Legendre-Fenchel conjugate of w , i.e. the function V ( u ) =sup x ∈{ Y ,...,Y n } ( u · x − w ( x )) = max k ( u · Y k − w k ), where w k = w ( Y k ) for each k = 1 , . . . , n .Note that the subdifferential ∂V is a singleton except at the boundaries of the sets U k = { u : arg max i {(cid:104) u, Y i (cid:105) − w i } = k } , so ∇ V is defined L U almost everywhere. Since for all k ,and all u ∈ U k , Y k ∈ ∂V ( u ), ∇ V satisfies (i). Finally, by Brenier’s Theorem (theorem2.12(ii) page 66 of [22]), ∇ V pushes L U forward to P n , hence it also satisfies (iii). Thefunction Φ µ : w → (cid:82) w ∗ ( u ) dµ ( u ) is convex, which follows from the equality (cid:90) w ∗ ( u ) dµ ( u ) = max σ ( . ) (cid:90) (cid:10) u, Y σ ( u ) (cid:11) − w σ ( u ) dµ ( u )where the maximum is taken over all measurable functions σ : R d → { , ..., n } . (cid:3) The Tˆatonnement Algorithm.
The problem is therefore to minimize the convex func-tion w → Φ µ,π ( w ) = Φ µ ( w ) + (cid:80) nk =1 π k w k , which can be done using a gradient approach. † ALFRED GALICHON § MARC HENRY ‡ To the best of our knowledge, the idea of using the Monge-Kantorovich duality to com-pute the weights using a gradient algorithm should be credited to F. Aurenhammer and hiscoauthors. See [2] and also [19]. However, by the economic interpretation seen above, thealgorithm’s dynamics is the time-discretization of a “tˆatonnement process,” as first imag-ined by L´eon Walras (1874) and formalized by Paul Samuelson (1947) (see [20]). Hence toemphasize the economic interpretation, we shall refer to the algorithm as “TˆatonnementAlgorithm”.
The Algorithm.
Initialize the prices w = 0. At each step m , compute Φ µ,π ( w m )and the excess demand ∇ Φ µ,π ( w m ). For a well chosen elasticity parameter (cid:15) m , update theprices proportionally to excess demand w m +1 = w m + (cid:15) m ∇ Φ µ,π ( w m )Go to next step, or terminate the algorithm when the excess demand becomes smaller thana prescribed level. (cid:3) This algorithm requires the evaluation of the function and its gradient. For this weshall need to compute in turns, for each k : 1) U k = { u : arg max i {(cid:104) u, Y i (cid:105) − w i } = k } ; 2) p k = µ ( U k ); and 3) u k the barycenter of ( U k , µ ) (that is u k = µ ( U k ) − (cid:82) U k zdµ ( z ).) Thenwe get the value of Φ µ,π ( w ): Φ µ,π ( w ) = (cid:80) ( (cid:104) u k , Y k (cid:105) − w k ) p k + w k π k and the value ofits gradient ∇ Φ µ,π ( w ) = π − p , ie. ∂ Φ µ,π ( w ) ∂w k = π k − p k . We have implemented thesecalculations in Matlab using a modified versions of the publicly available Multi-ParametricToolbox (MPT) . All the programs are available upon request. Conclusion
In comparison with existing literature on the topic on multidimensional risk exposures,this work proposes a multivariate extension of the notion of comonotonicity, which involvessimultaneous optimal rearrangements of two vectors of risk. With this extension, we areable to generalize Kusuoka’s result and characterize subadditive, comonotonic additive andlaw invariant risk measures by maximal correlation functionals, which we show can be MPT is available online at http://control.ee.ethz.ch/ ∼ mpt/. OMONOTONIC MEASURES OF MULTIVARIATE RISKS 29 conveniently computed using optimal transportation methods. We also show that the prop-erties of law invariance, subadditivity and comonotonic additivity can be summarized byan equivalent property, that we call strong coherence , and that we argue has a more nat-ural economic interpretation. Further, we believe that this paper illustrates the enormouspotential of the theory of optimal transportation in multivariate analysis and higher di-mensional probabilities. We do not doubt that this theory will be included in the standardprobabilistic toolbox in a near future. † Canada Research Chair in Mathematical Economics, University of British Columbia.E-mail: [email protected] § Corresponding author. ´Ecole polytechnique, Department of Economics, 91128 Palaiseau,France. E-mail: [email protected] ‡ D´epartement de sciences ´economiques, Universit´e de Montr´eal, CIRANO, CIREQ. E-mail: [email protected]
References [1] Artzner P., and F. Delbaen and J.-M. Eber and D. Heath, “Coherent measures of risk,”
MathematicalFinance , 9, pp. 203–228, 1999.[2] Aurenhammer, F., Hoffmann, F., and Aronov, B., “Minkowski-type theorems and least-squares clus-tering,”
Algorithmica
20, pp. 61–76, 1998.[3] Borwein J., and A. Lewis,
Convex Analysis and Nonlinear Optimization , 2nd Edition, New York:Springer, 2006.[4] Burgert, C., and L. R¨uschendorf, “Consistent risk measures for portfolio vectors,”
Insurance: Mathe-matics and Economics
38, pp. 289–297, 2006[5] Barrieu, P., and El Karoui, N., “Inf-convolution of risk measures and optimal risk transfer,”
Financeand Stochastics
Mathematical Finance
19 2, pp. 189-214,April 2009.[7] Dana, R.-A., “A Representation Result for Concave Schur Concave Functions,”
Mathematical Finance
15 (4), pp. 613–634, 2005.[8] Delbaen, F., “Coherent risk measures on general probability spaces,”
Advances in Finance and Stochas-tics: Essays in Honour of Dieter Sondermann , pp. 1–37, Berlin: Springer, 2002. † ALFRED GALICHON § MARC HENRY ‡ [9] Fabian,M., Habala, P., Hajek, P., Montesinos Santalucia, V., Pelant, J., and Zizler, V. FunctionalAnalysis and Infinite-Dimensional Geometry,
Springer: CMS Books in Mathematic, 2001[10] F¨ollmer, H., and A. Schied,
Stochastic Finance , de Gruyter, 2004.[11] Frittelli, M. and Rosazza Gianin, E., “Law invariant convex risk measures,”
Advances in MathematicalEconomics
7, pp. 33–46, 2005.[12] Galichon, A., “The VaR at Risk,” forthcoming,
International Journal on Theoretical and Applied Fi-nance .[13] Jouini, E., Meddeb, M., and Touzi, N., “Vector valued coherent risk measures,”
Finance and Stochastics
4, pp. 531–552, 2004.[14] Jouini, E., W. Schachermayer and Touzi, N., “Law invariant risk measures have the Fatou property,”
Advances in Mathematical Economics
9, pp. 49–71, 2006.[15] Kusuoka, S., “On law invariant coherent risk measures,”
Advances in Mathematical Economics
3, pp.83–95, 2001.[16] Rachev, S., and R¨uschendorf, L.,
Mass Transportation Problems. Volume I: Theory , and
Volume II:Applications , New York: Springer, 1998.[17] R¨uschendorf, L., “Law invariant convex risk measures for portfolio vectors,”
Statistics and Decisions
24, pp. 97–108, 2006.[18] R¨uschendorf, L., “Monge – Kantorovich transportation problem and optimal couplings,”
Jahresberichtder DMV
3, pp. 113–137, 2007.[19] R¨uschendorf, L. and Uckelmann, L., “Numerical and analytical results for the transportation problemof Monge-Kantorovich,”
Metrika. International Journal for Theoretical and Applied Statistics
51, pp.245–258, 2000.[20] Samuelson, P.,
Foundations of Economic Analysis , Cambridge, MA: Harvard University Press, 1947.[21] Schmeidler, D., “Subjective probability and expected utility without additivity,”
Econometrica
57, pp.571–587, 1989.[22] Villani, C.,
Topics in Optimal Transportation , Providence: American Mathematical Society, 2003.[23] Yaari, M., “The dual theory of choice under risk,”
Econometrica
55, pp. 95–115, 1987.
Appendix A. Illustrations
The tˆatonnement algorithm was implemented with the use of the Multi-Parametric Tool-box, and we derived the general quantile ∇ V that achieves the optimal transportation ofthe uniform distribution on the unit cube in R d and the empirical distribution of a sampleof uniformly distributed random vectors in the unit cube in R d . The following illustrationsshow the Monge-Kantorovitch potential V , also interpreted as the buyer’s indirect utility in OMONOTONIC MEASURES OF MULTIVARIATE RISKS 31 our general equilibrium interpretation in the case of samples of size 7 and 27 respectively.The potential V is piecewise affine, and the algorithm also requires to determine the regionsover which it is affine, and their volume and center of mass. The corresponding partition isgiven opposite each potential plot. For illustration purposes, the dimension of the space d istaken equal to 2, but the generalized quantiles and corresponding partitions can be derivedin higher dimensions. Figure 1.
Mapping the uniform to a discrete discrete distribution in di-mension d = 2. Upper row: seven atom points, lower row: twenty-sevenatom points. Left column: the potential V ( u ) = w ∗ ( u ). Right column: thecorresponding partition of the space U . † ALFRED GALICHON § MARC HENRY ‡ Figure 2.
The value of the risk measure in the Gaussian case, plot against ρ . Left: σ = σ = 1. Right: σ = 1 , σ = 2. Appendix B. Results on Optimal Transportation
In this appendix we recall basic results in Optimal Transportation theory. Roughly put,this theory characterizes the properties of the couplings of two random variables whichachieve maximal correlation. We state the following basic result, due to Brenier (cf. [22],Th. 2.12, in which a proof is given).
Proposition 7.