aa r X i v : . [ ec on . E M ] F e b EXTREME DEPENDENCE FOR MULTIVARIATE DATA
DAMIEN BOSC † AND ALFRED GALICHON ‡ Abstract.
This article proposes a generalized notion of extreme multivariate dependence betweentwo random vectors which relies on the extremality of the cross-covariance matrix between these twovectors. Using a partial ordering on the cross-covariance matrices, we also generalize the notion ofpositive upper dependence. We then proposes a means to quantify the strength of the dependencebetween two given multivariate series and to increase this strength while preserving the marginaldistributions. This allows for the design of stress-tests of the dependence between two sets of finan-cial variables, that can be useful in portfolio management or derivatives pricing.
Keywords : Multivariate dependence; Extreme dependence; Covariance set.
JEL Classification : C58, C02. Introduction
Extreme dependence, and the closely related notion of comonotonicity are important concepts invarious fields. It is central in the economics of insurance (following the seminal work of Borch, 1962,Arrow, 1963, and Arrow, 1970), in economic theory (see Yaari, 1987, Landsberger and Meilijson,1994, and Schmeidler, 1989), in statistics (see Dall’Aglio, 1956, R¨uschendorf, 1990, S. Rachev, 1991,Zolotarev, 1983) as well as in financial risk management (see the recent book by Malevergne andSornette, 2006 and references therein).The notion of extreme (positive) dependence or comonotonicity for univariate random variablesgoes back to the work of Hoeffding, 1940 and Fr´echet, 1951. Two real random variables (
X, Y ) arecomonotonic if their cumulative distribution function satisfy F X,Y ( x, y ) = min( F X ( x ) , F Y ( y )), orequivalently, if their copula C is the upper Fr´echet copula C ( u , u ) = min( u , u ) . Equivalently X and Y can be written as nondecreasing functions of a third random variable Z . As a consequence,comonotone variables maximize covariance over the set of pairs with fixed marginals: E ( XY ) = sup ˜ X ∼ X ˜ Y ∼ Y E ( ˜ X ˜ Y ) , (1)where ˜ X ∼ X denotes the equality in distribution between ˜ X and X . Similarly, X and Y are saidto have extreme negative dependence when X and − Y have extreme positive dependence. Theircovariance is then minimized instead of maximized, and their copula is the lower Fr´echet copula, C ( u, v ) = max ( u + v − , X and Y are random vectors. Our contribution is twofold. First, weintroduce (in Definition 2) a generalization of the notion of extreme dependence to the multivariate Date : First version is March 19, 2010. The present version is of April 6, 2013. † Corresponding author. EDF R&D, 1 avenue du G´en´eral de Gaulle, 92140, Clamart, France. Email:[email protected]. Bosc acknowledges the support of the AXA Research Fund, AXA Investment Man-agers and the Investment Solutions Quantitative Team. ‡ Quantitative Finance , Volume 14, Issue 7, March 2014,https://doi.org/10.1080/14697688.2014.886777. † AND ALFRED GALICHON ‡ case, and we investigate how extreme positive dependence generalizes in this setting. We also in-troduce a notion of positive extreme dependence (in Definition 3). Then we introduce a measure ofthe strength of dependence based on an entropic measure (in Section 5). We then show how usefulcan be the concept of extreme dependence either in risk-management or in asset pricing. Generalizing extreme dependence.
When dealing with the multivariate case, where X and Y arerandom vectors in R d , there is no canonical way to generalize this notion of (positive or negative)extreme dependence and Fr´echet copula. One approach, based on the theory of Optimal Transport(see eg. the books S. T. Rachev and R¨uschendorf, 1998 and Villani, 2003) would be to consider thefollowing optimization problem max ˜ X ∼ X ˜ Y ∼ Y E ( ˜ X · ˜ Y ) (2)where · is the scalar product in R d . This program is a multivariate extension of the covariancemaximization problem (1) and defines as extreme the distribution of the pair ( ˜ X, ˜ Y ) solution to theabove problem. However it does not take into account the cross-dependence between X i and Y j for i = j .A more satisfactory generalization is based on the idea that both positive and negative extremedependence are obtained by the maximization of a non-zero bilinear form in ( X, Y ) over the set ofcouplings of X and Y (i.e. joint distributions with fixed marginals). In other words, we considersolutions of (2), where the scalar product is replaced by a non-zero bilinear form. This will be ournotion of multivariate extreme dependence : random vectors X and Y are said to exhibit extremedependence if their cross-covariance matrix maximizes the expected value of a non-zero bilinearform over all the couplings of X and Y . These extreme couplings are proposed as a generalizationof Fr´echet (positive and negative) extreme dependence in the multivariate case. We provide anatural geometric characterization of this notion by considering the covariance set , that is the setof all cross-covariance matrices E ( XY ′ ) for all the couplings of X and Y . We show that X and Y have extreme dependence if and only if their cross-covariance matrix lies on the boundary of thecovariance set.We then turn to generalizing the notion of extreme positive dependence. One natural way togeneralize extreme positive dependence is to look for the couplings ( ˜ X, ˜ Y ) having a cross-covariancematrix Cov ( ˜ X, ˜ Y ) = E ( ˜ X ˜ Y ′ ) = ( E ( ˜ X i ˜ Y j )) i,j which would be maximal for a certain partial (conical)ordering on matrices. As we shall see, it turns out that extreme positive dependence implies extremedependence, and we characterize the geometric locus of extreme positive dependent vectors on thecovariance set. Stress-testing dependence.
We give a method to associate any coupling, for example any empiricalcoupling, with an extreme coupling, by means of entropic relaxation technique. An algorithm isdescribed and results concerning its implementation are given. In particular, this algorithm providesa means to compute effectively the covariance set. We then apply these results to build stress-testsof multivariate dependence for portfolio management and to the pricing of derivatives on multipleunderlyings. We also propose the construction of indices of maximal dependence, that is linearcombinations of assets that have remarkable properties of extreme dependence.The article is organized as follows: the first section presents the notion of covariance set and thedefinition of couplings with extreme dependence, as well a characterization of such couplings. Thesecond section defines and characterize couplings with positive extreme dependence, in relation tothe notion of extreme dependence. The third section provides a algorithm to compute extremecouplings and the covariance set. An index of dependence, the affinity matrix is defined; a methodto associate any coupling with an extreme coupling is described. We conclude with financial appli-cations, namely stress-testing portfolio allocations and options pricing, as well as the computationof indices with extreme dependence. All proofs are collected in appendix B.
XTREME DEPENDENCE FOR MULTIVARIATE DATA 3
Notations, definitions.
Let P and Q be two probability distributions on R I and R J , with finitesecond order moments. Without restricting the generality we assume that P and Q have null firstmoments, so that the second order moments E ( X i Y j ) are indeed covariances. Π( P, Q ) is the set of allprobability distributions over R I × R J having marginals P and Q . We refer to an element of Π( P, Q )as a coupling , understating the probabilities P and Q . If M and N belong to M I,J ( R ), the set of realmatrices of size I × J , their scalar product is denoted by M · N = T r ( M ′ N ). If ( X, Y ) ∼ π ∈ Π( P, Q ),we denote indifferently σ X,Y or σ π the matrix with general term E ( X i Y j ), which is the covariancebetween X i and Y j ; it is the cross-covariance matrix between X and Y . Remark that σ X,Y is theupper-right block of the variance-covariance matrix of the vector Z = ( X, Y ) ′ , and that σ X,Y isneither a square matrix nor a symmetric matrix in general.Moreover, we will say that a coupling π ‘projects’ onto σ π , interpreting the function π σ π as aprojection operator.Eventually, let us recall that the subdifferential ∂f ( x ) of a convex function on R n at a point x isdefined as set of vectors v such that f ( x ) − f ( x ) ≥ v · ( x − x ) for all x ∈ R n . Here the dot is theusual scalar product. It reduces to {∇ f ( x ) } if f is differentiable at x , which is true for almostevery x according to Rademacher theorem.2. Related literature and contribution
As mentioned in the introduction, the extension to the multivariate setting of the correlation max-imization problem (1) has been tackled by several authors in order to define notions of multivariatecomonotonicity . Puccetti and Scarsini, 2010 list several possible definitions of multivariate comono-tonicity, among which two of them are directly related to the variatonal problem (2). Namely, c-comonotonicity refers to the couplings solving problem (2): these are the optimal quadratic cou-plings of Optimal Transport Theory, also called maximum correlation couplings . This variationalapproach to multivariate comonotonicity is also the basis of Ekeland et al., 2012 and Galichonand Henry, 2012. They propose to extend the univariate notion of comonotonicity and define the µ -comonotonicity by stating that two vectors X and Y are µ -comonotone if there exists a randomvector U ∼ µ such that E ( X · U ) = max { E ( X · ˜ U ) , ˜ U ∼ µ } E ( Y · U ) = max { E ( Y · ˜ U ) , ˜ U ∼ µ } . This notion of comonotonicity has the advantage of being transitive, unlike c-comonotonicity. Car-lier et al., 2012 showed that this notion of comonotonicity appeared as ‘more natural’ than the otherones because it is directly related to Pareto efficiency.This article aims at finding multivariate couplings which exhibit a form of strong dependence, justas the previously defined comonotonic couplings. In what follows, the couplings defined as ‘extreme’are comonotonic couplings (in the sense of the c-comonotonicity) up to a linear transform of onemarginal (the c-comotonic coupling corresponds to the identity transform). In other words, anextreme coupling (
X, Y ) satisfies the variational problem E ( X ′ M Y ) = sup π ∈ Π( P,Q ) E π ( X ′ M Y ) . (3)This definition of extreme dependence is broad enough to encompass ‘positive dependence’ as c-comonotonicity as well as ‘negative dependence’ (counter-comonotonicity in the univariate case).Furthermore, it allows for a geometrical interpretation of extreme dependence: the cross-covariancematrix of an extreme coupling is located on the boundary of the compact and convex set of all pos-sible cross-covariance matrices, called the covariance set. This set has been introduced in Galichonand Salani´e, 2010 in the case with discrete marginals, and generalized to the case with continuousmarginals in Dupuy and Galichon, 2013. Taking advantage of this simple interpretation, we thencharacterize the couplings π which have a cross-covariance matrix σ π that are maximal for somepartial orders ≻ . DAMIEN BOSC † AND ALFRED GALICHON ‡ Although the idea of generating extreme dependence by solving problem (3) arises naturally fromthe theory of optimal transport – and more generally in the theory of distributions with givenmarginals, see e.g. Tiit, 1992, the computation of the covariance set remained a difficult pointuntil now. The rest of the article proposes a method to compute extreme couplings, and for anygiven coupling ˆ π , proposes a means to build a continuous sequence of couplings π T with π beingextreme, and σ π = σ ˆ π . This is done by penalizing the problem (3) with an entropy term, whichallows for fast computations when the marginals are discrete distributions of probability, thanks tothe Iterative Proportional Fitting Procedure. This algorithm goes back to Deming and Stephan,1940, and has been used by Kosowsky and Yuille, 1994 (although they do not refer explicitly toIPFP) to solve the assignment problem, and in econometrics in Galichon and Salani´e, 2010.3. Multivariate extreme dependence
In this section we detail the notion of multivariate extreme dependence we propose. Considerthe covariance set, the set of cross-covariance matrices of couplings π ∈ Π( P, Q ): Definition 1.
The covariance set F ( P, Q ) is defined as: F ( P, Q ) = { Σ ∈ M I,J ( R ) : ∃ π ∈ Π( P, Q ) , Σ ij = E π ( X i Y j ) , for all i, j } . As Π(
P, Q ) is a convex and compact set (a proof of this property can be found in Villani, 2003, pp.49-50), the covariance set is also a convex compact subset of M I,J ( R ).Figure 1 gives an example of the 2 dimensional section of a covariance set, meaning that only thediagonal elements of the cross-covariance matrix are represented. P and Q are discrete distributionson R with equally weighted atoms and we look at the two component-wise covariances E ( X Y ), E ( X Y ). The solid curve is the boundary of the covariance set: every coupling between P and Q hasa cross-covariance matrix located within the convex hull of this curve. The independence couplingprojects on the point (0 , x -axis represent respectively the minimal and maximalcovariances between X and Y . These covariances would be attained in terms of copulas by thelower and upper Fr´echet copulas. This motivates our definition of extreme dependence couplings ascouplings whose cross-covariance matrices are on the boundary of the covariance set. Figure 1.
Example of a 2 dimensional section of a covariance set
XTREME DEPENDENCE FOR MULTIVARIATE DATA 5
Definition 2.
A coupling ( X, Y ) ∼ π ∈ Π( P, Q ) has extreme dependence if and only if ( E π ( X i Y j )) ij lies on the boundary of the covariance set F ( P, Q ) . The cross-covariance matrix between X and Y , σ X,Y , satisfies:
T r (cid:0) M ′ σ X,Y (cid:1) = E ( X ′ M Y ) , for all M ∈ M I,J ( R ) (4)which allows to reformulate the notion of extreme dependence as follows: Theorem 1.
The following conditions are equivalent:i) ( X, Y ) ∼ π ∈ Π (
P, Q ) have extreme dependence;ii) there exists M ∈ M I,J ( R ) \{ } such that T r (cid:0) M ′ σ π (cid:1) = sup ˜ π ∈ Π( P,Q ) T r (cid:0) M ′ σ ˜ π (cid:1) or equivalently E π ( X ′ M Y ) = sup ˜ π ∈ Π( P,Q ) E ˜ π ( X ′ M Y ); (5) iii) there exists M ∈ M I,J ( R ) \{ } and a convex function u on R I such that M Y ∈ ∂u ( X ) holdsalmost surely. This theorem is a corollary of the characterization of optimal couplings proved in S. T. Rachev andR¨uschendorf, 1990 and Brenier, 1991.Thus a coupling (
X, Y ) is extreme if and only if there exists a linear transform, namely a nontrivialmatrix M , such that ( X, M Y ) is a maximum correlation coupling. In dimension 1, the interpretationis obvious: two real random variables have extreme dependence iff there exists a scalar m = 0 anda nondecreasing function u such that mY = u ( X ). According to the classic terminology, X and Y are said comonotonic if m >
0, and anti-comonotonic otherwise.When M = Id in (5), the optimal coupling is the optimal transport coupling for the quadratic cost(it solves problem (2)). 4. Positive extreme dependence
The aim of this section is to propose a generalization of the concept of Fr´echet copula of upperdependence to the multivariate case. As already mentioned, copula theory fails to handle thisproblem. Indeed, if C P and C Q are two copulas, the first in dimension I (associated with distribution P ) and the second in dimension J (associated with distribution Q ), a natural candidate for a copulamodeling positive extreme dependence would be C π ( x, y ) = min( C P ( x , . . . , x I ) , C Q ( x , ..., x J )).But according to an ‘impossibility theorem’ due to Schweizer and Sklar, 1983, C π is a copulafunction if and only if C P and C Q are themselves upper Fr´echet copulas. We thus depart fromthe copula approach and aim at characterizing positive extreme dependence through the cross-covariance matrix of X and Y . Starting from the observation that in the univariate case, thepositive extreme dependence attains maximum covariance between X and Y over all the couplingsof P and Q , we introduce a conic order on the cross-covariance matrices σ X,Y and define positiveextreme dependent couplings as the couplings whose cross-covariance matrices are maximal withrespect to this order.For a given compact convex set B ⊂ M I,J ( R ) such that 0 / ∈ B (such a set is called a compactbasis), a closed convex cone in M I,J ( R ) is defined by setting: K ( B ) = { y ∈ M I,J ( R ) | x · y ≥ , ∀ x ∈ B } (6)Considering cones of this form might seem restrictive (appendix A gives more details on such cones),yet we provide below some examples that show that classic cones can be defined in such a manner.Let M , M be two matrices in M I,J ( R ). A strict conic order on M I,J ( R ) is defined by M ≻ K ( B ) M if M − M ∈ Int ( K ( B )) DAMIEN BOSC † AND ALFRED GALICHON ‡ The interior of K ( B ) is { y ∈ M I,J ( R ) | x · y > , ∀ x ∈ B } . Definition 3.
Let B be a compact basis. A coupling ( X, Y ) such that σ X,Y is a maximal elementin F ( P, Q ) with respect to the strict conic order ≻ K ( B ) is said to have positive extreme dependence with respect to ≻ K ( B ) . The following results fully characterize couplings with positive extreme dependence in terms ofmaximal correlation couplings.
Theorem 2.
The following conditions are equivalent:i) ( X, Y ) ∼ π ∈ Π (
P, Q ) have extreme positive dependence with respect to ≻ K ( B ) ii) there exists M ∈ B such that T r (cid:0) M ′ σ π (cid:1) = sup ˜ π ∈ Π( P,Q ) T r (cid:0) M ′ σ ˜ π (cid:1) or equivalently E π ( X ′ M Y ) = sup ˜ π ∈ Π( P,Q ) E ˜ π ( X ′ M Y ) (7) iii) there exists M ∈ B and a convex function u such that M Y ∈ ∂u ( X ) holds almost surely Hence, σ X,Y is maximal if and only if there exists M ∈ C such that X and M Y are maximallycorrelated. Obviously, this result is a close parallel to Theorem 1 except that M is constrainedto belong to B . As a consequence the positive extreme couplings are a particular case of extremecouplings. The interpretation in dimension 1 is again straightforward: X and Y have positiveextreme dependence (w.r.t. the usual order on R ) iff they are comonotonic. Figure 2.
Shaded region: location of the couplings dominating the coupling mate-rialized by the square dot.To better understand the relation between Definition 2 and Definition 3, let us go back to the twodimensional section of the covariance set discussed in the previous section, and consider that K ( B )is the positive orthant of R × R . The shaded region in Figure 2 is the set of couplings dominatingthe coupling that projects on the square dot, with respect to the orthant order; as a consequencethis coupling can not have positive extreme dependence. This intuitively explains why maximalelements should be on the boundary of the covariance set, hence that positive extreme couplings XTREME DEPENDENCE FOR MULTIVARIATE DATA 7
Figure 3.
Maximal couplings on the boundary.should be extreme couplings. Maximal elements are represented on the bold curve in figure 3.Consequently the couplings exhibiting positive extreme dependence project on this bold portionof the boundary of the covariance set. They form only a small part of the couplings of extremedependence.To demonstrate the applicability of this approach, here are three examples of partial orders oncovariance matrices.
Example 1.
Orthant order. Let M + I,J ( R ) (resp. M ++ I,J ( R ) ) denotes the set of real I × J matriceswith nonnegative coefficients (resp. positive coefficients). The set B = M + I,J ( R ) ∩ { P i,j M i,j = 1 } isa compact basis. K ( B ) is easily seen to be the set M + I,J ( R ) and its interior is M ++ I,J ( R ) . Eventually M ≻ M iff M − M has only positive coefficients: this is the (strict) orthant order on matrices. Example 2.
Loewner order. Let S + n and S ++ n denote respectively the set of nonnegative matricesin S n and the set of definite positive matrices in S n . If B = { S ∈ S + n ( R ) | T r ( S ) = 1 } is the setof semi-definite matrices with unit trace, B is a convex compact subset of M n ( R ) and K ( B ) = { M ∈ M n ( R ) | T r ( M ′ S ) ≥ , ∀ S ∈ B } is the set of matrices M whose symmetric part, M + M ′ , issemi-definite positive. The strict order ≻ K ( B ) is then defined as: M ≻ M iff the symmetric partof M − M is definite positive. This is an extension to M n ( R ) of the classic Loewner order onsymmetric matrices.
The following trivial example shows that the ordering induced by Example 2 allows various positiveextreme couplings. A first remark is that the maximum correlation coupling is indeed positiveextreme, by setting M = Id in theorem 2. Consider P ∼ N (0 , I ), the bivariate normal distribution,and Q = N (0 , ⊗U (0 , , the distribution of a vector whose first component is normal and the secondone is the uniform distribution on (0 , X ∼ P and Y = ( X , U ) ′ , U ∼ U (0 , independent from ( X , X ), so that Y ∼ Q . This coupling has not themaximum correlation even though X = Y . However it satisfies (7) with A = ( ) and qualifiesas a maximal coupling. DAMIEN BOSC † AND ALFRED GALICHON ‡ An index of dependence
Suppose now we are observing (or simulating) a coupling ˆ π ∈ Π( P, Q ), thereafter referred to asan empirical coupling. Even if this coupling is supposed to exhibit strong dependence, its cross-covariance matrix will never be exactly located on the boundary of the covariance set. Our problemis then to associate an extreme coupling with ˆ π ; more precisely, we propose to find a continuoussequence of non deterministic couplings π T such that π = ˆ π and π is an extreme coupling. Inother words, we give a means to go smoothly from an empirical coupling to an extreme one byprogressively increasing the strength of the dependence between the marginals. This is done byintroducing an entropic penalization of (5), so that its solutions project on inner points of thecovariance set.5.1. Entropic relaxation.
Consider the following problem, which is the entropic penalization of(5): W ( M, T ) := max π ∈ Π( P,Q ) (cid:0) E π ( X ′ M Y ) +
T Ent ( π ) (cid:1) (8)The entropy of a coupling π is defined as Ent ( π ) = (cid:26) − R log π ( x, y ) dπ ( x, y ) , if π ≪ dx ⊗ dy and the integral exists and is finite −∞ otherwiseThe parameter T can be thought of as a ‘temperature’ parameter which controls the strength ofthe entropic penalization. The problem (5) corresponds to T = 0, while letting T to + ∞ amountsto maximize the entropy, in which case the solution of problem (8) is the independence coupling.Let π M,T denote a solution of (8); a proof of its existence can be found in R¨uschendorf, 1995 andreferences therein. We assume furthermore that the entropy of ˆ π is finite.Fixing the temperature at 1, our aim in the first place is to find a matrix M such that ˆ π and π M, have the same cross-covariance matrix: σ ˆ π = σ π M, . By a property of the subdifferential of amaximum function, the gradient of W with respect to M is: ∇ M W ( · ,
1) = σ π M, . This implies that M is the solution of the following variational problemmin M ∈ M I,J ( R ) W ( M, − σ ˆ π · M (9) W ( · ,
1) is a convex function as a supremum of affine functions in M , and consequently the objectivefunction in (9) is convex as well: this is a classic unconstrained convex minimization problem.Moreover, (9) is bounded below, which yields the existence of a global minimizer. A detailed proofand a discussion of uniqueness in (9) is given in appendix C.Figure 4 shows the diagonal of σ π M, in the coordinates ( E ( X Y ) , E ( X Y )), for a large numberof randomly generated matrices M . This graph is obtained by sampling many matrices M withcoefficients uniformly distributed in the interval [ − , σ π ( M,T ) . T is taken small enough to obtain near from extreme couplings.The solution of (8) is computed thanks to the algorithm presented in section 5.2. The bullet pointhas coordinates ( E ˆ π ( X Y ) , E ˆ π ( X Y )). One sees that any inner point of the covariance set can beattained by a properly chosen π M . This is a noticeable advantage of the entropic relaxation: not onlythe optimal couplings solving (8) are easily computed (at least when the marginals are discrete, seesection 5.2), but changing the temperature parameter allows to reach any cross-covariance matrixinside the covariance set.5.2. Numerical solution.
The optimal π M, in (8) obeys the following equation (see e.g. R¨uschen-dorf, 1995 for a proof):log π M, ( x, y ) = x ′ M y + u ( x ) + v ( y ) , u ∈ L ( dP ) , v ∈ L ( dQ )In other words, the log-likelihood of π M, is the sum of a quadratic term x ′ M y and of an additivelyseparable function in x and y . The solution is found by setting u and v such that π M, has themarginals P and Q . This is the purpose of the Iterative Projection Fitting Algorithm (Deming andStephan, 1940, Von Neumann 1950). XTREME DEPENDENCE FOR MULTIVARIATE DATA 9
Figure 4.
Projection of various π M This algorithm consists in building a sequence π n such that π n has first marginal P and π n +1 hassecond marginal Q . It turns out that π n converges towards a probability π with correct marginals P and Q .When the marginals P and Q are discrete distributions with atoms P ( x ) and Q ( y ) respectively, thealgorithm is straightforward, as it consists in solving a series of linear systems: e v n +1 ( y ) = Q ( y ) P x r ( x,y ) e un ( x ) e u n +1 ( y ) = P ( x ) P y r ( x,y ) e vn +1( y ) where r ( x, y ) = e x ′ My P x,y e x ′ My .The convex unconstrained minimization problem (9) can be solved by a Quasi-Newton algorithm(we used the BFGS method in the examples below).Of course, this algorithm can be used for any temperature T , by replacing M by MT in the previousequations.5.3. Derivation of the extreme coupling.
We recall that our aim is to associate an innercoupling (i.e. a coupling whose cross-covariance matrix is inside the covariance set) to some extremecoupling which projects onto the boundary of the covariance set, by finding a trajectory of couplingsthat goes smoothly from the inner one to the extremal one.The previous algorithm yields a particular matrix ˆ M and a coupling π ˆ M such that σ ˆ π = σ π ˆ M, . Thiscoupling was found by setting arbitrarily the temperature at 1; the entropy penalization was thuseffective and this allowed to reach inner points in the covariance set. This temperature parameter iseasily explained. When it goes to + ∞ , the entropy penalization is predominant in (8). Intuitively,the solution is the coupling with maximal ‘disorder’: this is the independence coupling. On thecontrary, the less is the temperature, the closer (8) is to the non penalized problem. Hence, thelower T , the more π ˆ M,T projects near the boundary of the covariance set. Hence associating ˆ π withan extreme coupling can be done in the following way: once ˆ M is found, a sequence of π ˆ M,T n , T n ↓ † AND ALFRED GALICHON ‡ Figure 5 summarizes this idea: each point on the curve is the projection of some π ˆ M,T n . As Figure 5.
A trajectory toward an extreme coupling when the sectors are HealthCare and Financials T → + ∞ , we recover the independence coupling whose projection is located at (0,0). When thetemperature decreases, the trajectory passes on ˆ π at T = 1, and gradually approaches the boundaryof the covariance set. The entropy is decreasing along this trajectory, as Ent ( π T ) decreases as T ↓ W ( M, T ) in T ), and thus lowering the temperature corresponds to goingaway from the independence coupling (maximal entropy). Thus, the temperature can be seen as ameans to control the strength of the dependence. The matrix ˆ M can be seen as an affinity matrix:in the limit of T →
0, the extreme coupling π ˆ M, achieves the supremum of E π ( X ′ ˆ M Y ). Thus ˆ M is the linear transform that makes X the most dependent with ˆ M Y under π ˆ M, .This can be used to define formally an index of dependence , for ˆ π different from the independencecoupling: choosing a norm || · || over the set of matrices M I,J ( R ) and using the homogeneity of W ,namely W ( λM, λT ) = λW ( M, T ) for all λ ∈ R , we have π ˆ M, = π ˆ M/ || ˆ M || , / || ˆ M || and the temperature || ˆ M || appears as an indicator of the strength of the dependence between the marginals of ˆ π .6. Applications
In the following financial applications below, we use the technique described in the previoussection with times series of linear daily returns on industrial sectors of mainstream indices: S&P500 and DJ Eurostoxx. We consider Health Care, Financial and Food & Beverage sectors of theseindices: P and Q are distributions on R . The historical data spans 5 years between September2004 and September 2009. Table 1 gives summary statistics (the three first variables corresponds toS&P sectors, the last third to Eurostoxx). In particular, the correlations between sectors belongingto different indices are mild ( < XTREME DEPENDENCE FOR MULTIVARIATE DATA 11
Table 1.
Summary StatisticsMean Returns 10 − ( . − .
13 1 .
67 1 . − .
37 3 . )Variance 10 − . ( .
36 7 .
65 1 .
16 1 .
14 4 .
15 1 . )Correlation matrix .
66 10 .
76 0 .
62 10 .
22 0 .
10 0 .
19 10 .
26 0 .
33 0 .
25 0 .
49 10 .
22 0 .
16 0 .
22 0 .
67 0 .
58 1 . Cross-Covariance 10 − . (cid:16) .
74 3 .
05 2 . .
04 1 . . .
66 4 .
62 2 . (cid:17) Numerical Results. P and Q are discrete distributions with equally weighted atoms in R ,each atoms being a vector of the returns at some date of the three sectors. P = 1 N N X t =1 δ r Xt , r Xt = vector of the linear returns at date t on the three sectors of the S&P500 The optimal ˆ M we find when considering the three sectors or the Construction and Health Caresectors are: (cid:0) . − . − .
10 0 . (cid:1) (cid:18) . − . − . − .
39 0 . − . − . − .
15 0 . (cid:19) error = || σ M − σ ˆ π |||| σ ˆ π || ≈ . < . σ ˆ π , the cross-covariance targeted, and σ π M, , the covariance matrix of the optimalcoupling. They should be perfectly equal in theory and this percentage measures the convergenceof the gradient algorithm.6.2. Financial applications.
First, we use the trajectory of couplings T π ˆ M,T as a continuousfamily of scenarios of increasing dependence. Theses scenarios are used to build scenarios of stress-tests involving multivariate variables, with obvious applications to risk management. By stress-testing, we mean increase the index of dependence defined above (that is, lowering the temperatureparameter), thus shifting away continuously from some coupling ˆ π to the extreme coupling π ˆ M, .This is to be compared to the method that consists in selecting the maximum correlation couplingas the ‘strongest dependence scenario’; indeed this coupling might be less in line with the cross-covariance structure of the empirical coupling ˆ π , yielding unexpected and undesired results whenmanaging risky portfolios or options on several assets.Then, we exploit further the affinity matrix ˆ M in order to exhibit indices of maximal correlation,based on an analysis of its singular value decomposition.6.2.1. Portfolios stress-testing.
In order to underline the necessity of accounting properly for themultivariate dependence, the problem of one-period portfolio allocation is considered. Suppose aninvestor chooses to allocate his wealth between assets X , . . . , X n , Y , . . . , Y m . The problem is tostudy the impact of the change of the dependence between X = ( X , . . . , X n ) and Y = ( Y , . . . , Y m )on the investor’s portfolio.In the numerical examples below, the assets are S&P Sector Indices: X is composed of Materials,Construction and Retail indices, while Y is composed of Food and Beverage, Health Care, Financialsand Utilities indices. The corresponding summary statistics are given in table 2. Correlation † AND ALFRED GALICHON ‡ is higher than in the above examples as the sectors are industrial sectors on a single index, theS&P500. Table 2.
Summary StatisticsMean Returns 10 − . ( .
89 1 .
67 1 . − .
13 1 .
97 2 .
01 1 . )Variance 10 − . ( .
59 1 .
16 1 .
36 7 .
65 1 .
92 0 .
984 3 . )Correlation matrix .
72 10 .
71 0 .
76 10 .
69 0 .
86 0 .
65 10 .
69 0 .
85 0 .
69 0 .
76 10 .
69 0 .
67 0 .
75 0 .
62 0 .
66 10 .
70 0 .
76 0 .
60 0 .
72 0 .
74 0 .
56 1
Cross-Covariance 10 − . (cid:16) .
41 1 .
53 3 .
62 1 . .
921 0 .
979 1 .
83 1 . .
27 1 .
45 3 .
73 1 . (cid:17) It is assumed the investor chooses his portfolio allocation according to the Markowitz allocationproblem (over a one year horizon), meaning that the weights ω that determine the allocation arechosen by solving the problem max P i ω i =1 µ · ω − λ ω ′ Σ ω . µ are the expected yearly returns ofthe stocks, Σ the covariance matrix of the returns and λ a risk aversion parameter specific to theinvestor. We assume that both µ and Σ are the standard empirical estimators computed over aperiod of one-year, the in-sample period. The risk aversion parameter λ is set at 3. The solutionto the Markowitz allocation problem with these parameters is denoted w . The risk of a portfolio ishere identified to its variance, and is known as soon as the covariance between the assets is specified.When performing the allocation at time 0, the investor is expecting a risk of ω ′ Σ w . The dependencestress-test consists in considering that the market conditions changes after the investment decision:the strength of dependence between X and Y increases.The affinity matrix is computed with respect to the in-sample data. The whole trajectory ofcouplings toward the boundary obtains, parameterized by the temperature T . These couplings π T yield stressed covariance matrices Σ T = E π T (( X − E ( X ))( Y − E ( Y )) ′ ). Σ T represents a scenariowhere the marginals of X and Y are left unchanged, while the realized dependence between X and Y has increased, compared to the initial covariance matrix Σ.The unexpected risks the investor might face when the dependence varies is materialized by thevariance w ′ Σ T w , plot on graph 6. The variance obtained at temperature 1 is w ′ Σ w ; in the worstcase (which corresponds to temperature 0 . w T according to the covariance Σ T . The opportunity cost µ · w T − µ · w is the loss on the return when thedependence increases while the investor sticks to the initial allocation w . This cost is more and moresignificant as the temperature lowers, reaching 6% in this case. A comparison with the maximumcorrelation coupling is enlightening. First of all, this coupling is not defined when the dimension of X and Y are different. Consequently an asset is removed from Y (namely the Food and Beveragesindex) and the same computations as above are performed: a covariance matrix Σ B that would bethe realized covariance if the assets were in maximum correlation dependence is computed. On thisparticular example, the variance w ′ Σ B w is 60% lower than the expected variance w ′ Σ w . Otherexamples can yield a significantly higher covariance. This shows that the maximum correlationcoupling might not be always adapted as a means of stress-testing the dependence.A more classical way to stress the dependence is to suppose that the correlation between X i and Y j isfixed and equal to some parameter ρ for all i and j ; the resulting cross-covariance matrix is denotedΣ ρ . A problem of this method is that it is known beforehand that, depending on the marginals, Σ ρ might not be an admissible cross-covariance matrix for P and Q ; the resulting variance-covariance XTREME DEPENDENCE FOR MULTIVARIATE DATA 13
Figure 6.
Plot of T w ′ Σ T w Figure 7.
Opportunity cost as a function of the temperaturematrix of the vector (
X, Y ) might fail to be semi-definite positive. This stress-test yields in thiscase underestimated risks. Indeed, while in our framework the variance w ′ Σ w is at 1.91, this level ofvariance is attained only when ρ is above 95%, while the mean of the empirical cross-correlation isaround 60%. Furthermore, even if ρ is set at 100% (disregarding the admissibility problem evokedabove), the resulting variance is still lower than the one obtained with the extreme coupling.It appears that the trajectory T π T provides a coherent sequence of covariance matrices Σ T thatmodels an increase of the dependence between X and Y . This method respects both marginals and † AND ALFRED GALICHON ‡ has the advantage of generating admissible matrices contrary to the usual method of parameterizingcorrelation matrices by a single parameter. Moreover, the maximum correlation coupling fails inthis setting to properly account for increasing the risk of dependence, likely because it ignores thecross-correlation effects.6.2.2. Options pricing.
These couplings with increasing strength of dependence can be also usedfor the risk management and pricing of rainbow options (options on several underlyings). As acase study, consider the underlyings X , . . . , X n , Y , . . . , Y m . It is assumed that each one followsa log-normal martingale diffusion (i.e. we assume a null risk free rate and write the risk-neutraldynamics): dX it X it = σ Xi dW it , d h W i , W j i t = ρ Xij dt , X i = 1 dY it Y it = σ Yi dB it , d h B i , B j i t = ρ Yij dt , Y i = 1The model is fully specified as soon as the correlation matrix between W and B is set. Considerthe option that pays Φ = min((max i X iT − K ) + , (max j Y jT − K ) + ); it is the minimum between thepayoffs of two best-of options on the X i on the one hand and the Y j on the other hand. It payswhen the X iT and Y iT perform well, but mitigates the gain by selecting the lowest payoff between(max i X iT − K ) + and (max j Y jT − K ) + .Suppose an investor has sold this option and knows the distribution of the vector X and Y . In otherwords, he has been able to calibrate the volatilities σ Xi and σ Yi , as well as the correlation matricesof ( W , . . . , W n ) and of ( B , . . . , B m ). The investor may have a guess on the dependence between X and Y (or equivalently between B and W ), for instance an empirical estimation of the covariancematrix, but this guess is not sufficient to price the claim Φ in a conservative manner. A way to dothis is to compute the price of this claim when the strength of the dependence between X and Y varies from the independence coupling to some extreme coupling and pick the highest value for theclaim.For the purpose of numerical computations, the terminal distribution of the underlyings is dis-cretized. The atoms of the discretized marginals are respectively denoted x iT and y jT . For eachspecification of a cross-covariance matrix A between X and Y , a trajectory π T ( A ) is obtained. Theclaim is priced as the expected value of Φ under the distribution π T ( A ): P T ( A ) = E π T ( A ) (cid:16) min((max i X iT − K ) + , (max j Y jT − K ) + ) (cid:17) = X x iT ,y jT min((max i x iT − K ) + , (max j y jT − K ) + ) π T ( A )( x iT , y jT )In the following example, X has 3 components and Y has 4, σ X = (0 . , . , . ′ and σ Y =(0 . , . , . , . ′ . For the sake of the exposition W and B are standard Brownian motions( ρ X = Id n and ρ Y = Id m ) while the cross-correlation matrix between W and B is randomlygenerated, and set at (cid:18) .
087 0 .
126 0 .
068 0 . .
490 0 .
438 0 .
006 0 . .
136 0 .
369 0 .
447 0 . (cid:19) The strike is set at 1, i.e. at time 0 the option is at-the-money.As seen on graph 8, the price increases as the temperature lowers; this is an expected behavior, aswhen the dependence between the assets increases, so does the dependence between their respectivemaxima and hence the minimum of these maxima tends to be higher, which yields a higher price. Inthis setting, the stress-test increases the price by more than 30% (i.e. between the price found withthe independence coupling and the price found with the extreme coupling). This must be comparedto the price that is obtained when the cross-correlation matrix is taken of the form Σ ρ = ρ ... ρ ... ... ρ ... ρ ! . XTREME DEPENDENCE FOR MULTIVARIATE DATA 15
Figure 8.
Price as a function of the temperatureAs a matter of fact, the stress-test of the cross-correlation fails, as the resulting correlation matrix (cid:16) Id Σ ρ Σ ρ Id (cid:17) is no longer definite positive when ρ > √ which is lower than 30%. And even in the limit ρ → √ , the price does not reach 0.075, and is still lower than the non-stressed price.6.2.3. Indices of maximal correlation.
In order to better understand the link between the extremecoupling π ˆ M, and the maximum correlation coupling (the one that corresponds to M = Id in (5)),we use a singular value decomposition of the affinity matrix ˆ M of the coupling ( X, Y ). It writesˆ M = U SV ′ , with U and V two orthogonal matrices and S a diagonal matrix with nonnegativeentries. In particular, E π ˆ M, (cid:16) ( √ SU ′ X ) ′ ( √ SV ′ Y ) (cid:17) = max π ∈ Π( P,Q ) E π (cid:16) ( √ SU ′ X ) ′ ( √ SV ′ Y ) (cid:17) In other words, if ( ˜ X, ˜ Y ) = ( √ SU ′ X, √ SV ′ Y ), then this linear transform of ( X, Y ) has maximumcovariance (under the distribution π ˆ M, ).Thus if ˜ P is the distribution of √ SU ′ X with X ∼ P , ˜ Q is defined likewise from Q , and ˜ π ˆ M, is thedistribution of ( √ SU ′ X, √ SV ′ Y ) where ( X, Y ) ∼ π ˆ M, , then E ˜ π ˆ M, ( X ′ Y ) = max π ∈ Π( ˜
P , ˜ Q ) E π ( X ′ Y ).Eventually, the singular value decomposition of the affinity matrix provides linear transforms of themarginals that makes the extreme coupling π ˆ M, the maximum correlation coupling after a scalingof the marginals by these transforms.As an example, in the case of the 3 components described in the introduction of section 6, thistransform writes ˜ X = (cid:18) − . X +0 . X − . X − . X − . X +0 . X . X +0 . X +0 . X (cid:19) ˜ Y = (cid:18) − . Y +0 . Y − . Y − . Y − . Y +0 .. Y . Y +0 . Y +0 . Y (cid:19) † AND ALFRED GALICHON ‡ This result states that ˜ X and ˜ Y are most correlated to one another under the distribution of theextreme coupling. These two vectors are composed of portfolios involving the components of theoriginal index and can be viewed as new indices: we speak of indices of maximal correlation . Whenthe strength of dependence is maximal ( T = 0), they maximize the correlation E ( ˜ X ˜ Y ) among allthe couplings with same marginals.This analysis can be seen as an analog in the case of fiexd multivariate marginals of the canonicalcorrelation analysis, which consist, for two random vectors X and Y , in finding vectors a and b such that the correlation between a ′ X and b ′ Y is maximal. In the multivariate setting, √ SU ′ and √ SV ′ are the analogue of the optimal a and b . The technique described in this section has beenintroduced in the very different context of matching markets by Dupuy and Galichon, 2013 underthe name saliency analysis . 7. Conclusion
A recurring complaint in Applied Statistics is the “curse of dimensionality”: models that havea simple, computationally tractable form in dimension one become very complex, both computa-tionally and conceptually in higher dimension. We show here that convex analysis, along with thetheory of Optimal Transport, can lead to efficient solutions to the problem of extreme dependence.Building on a natural geometric definition of extreme dependence, we have introduced an index ofdependence and used the latter to build stress-tests of dependence between two sets of economicvariables. This is particularly relevant in the case of international finance, where the dependencebetween many economic variables in two countries is of interest.
Acknowledgments
The authors thank Rama Cont for a question which was the starting point of this article andGuillaume Carlier and Alexander Sokol for helpful conversation.
EFERENCES 17
References
Arrow, K. (1963). Uncertainty and the welfare of medical care.
Amer. Econom. Rev. , , 941–973.Arrow, K. (1970). Essays in the theory of risk-bearing . North-Holland Publishing Co., Amsterdam.Borch, K. (1962). Equilibrium in a reinsurance market.
Econometrica , , 424–444.Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Communications on pure and applied mathematics , (4), 375–417.Carlier, G., Dana, R.-A., & Galichon, A. (2012). Pareto efficiency for the concave order and multi-variate comonotonicity. Journal of Economic Theory , , 207–229.Dall’Aglio, G. (1956). Sugli estremi dei momenti delle funzioni di ripartizione doppia. Ann. Sc.Norm. Super. Pisa , , 35–74.Deming, W., & Stephan, F. (1940). On a least squares adjustment of a sampled frequency tablewhen the expected marginal totals are known. Annals of Mathematical Statistics , , 427–444.Dupuy, A., & Galichon, A. (2013). Personality traits and the marriage market .Ekeland, I., Galichon, A., & Henry, M. (2012). Comonotonic measures of multivariate risks.
Math-ematical Finance , , 109–132.Fan, K. (1951). Fixed-point and minimax theorems in locally convex topological linear spaces. Proceeding of the National Academy of Sciences , .Fr´echet, M. (1951). Sur les tabealux de corr´elation dont les marges sont donn´ees. Ann. Univ. Lyon.Sect. A , , 53–57.Galichon, A., & Henry, M. (2012). Dual theory of choice under multivariate risks. Journal of Eco-nomic Theory , (4), 1501–1516.Galichon, A., & Salani´e, B. (2010). Matching with trade-offs: Revealed preferences over competingcharacteristics .Hoeffding, W. (1940). Masstabinvariante korrelationstheorie.
Schriften des mathematischen Institutsund des Instituts f¨ur angewandte Mathematik des Universitat Berlin , , 179–233.Kosowsky, J. J., & Yuille, A. L. (1994). The invisible hand algorithm: Solving the assignmentproblem with statistical physics. Neural Networks , , 477–490.Landsberger, M., & Meilijson, I. (1994). Co-monotone allocations, Bickel-Lehmann dispersion andthe Arrow-Pratt measure of risk aversion. Ann. Oper. Res. , , 97–106.Malevergne, Y., & Sornette, D. (2006). Extreme financial risks, from dependence to risk manage-ment . Springer.Puccetti, G., & Scarsini, M. (2010). Multivariate comonotonicity.
Journal of Multivariate Analysis , , 291–304.Rachev, S. T., & R¨uschendorf, L. (1990). A characterization of random variables with minimum L - distance. Journal of Multivariate Analysis , .Rachev, S. T., & R¨uschendorf, L. (1998). Mass transportation problems. volume i: Theory andvolume ii: Applications . Springer New York.Rachev, S. (1991).
Probability metrics and the stability of stochastic models . John Wiley & SonsLtd.R¨uschendorf, L. (1990). Fr´echet-bounds and their applications.
Advances in probability distributionswith given marginals (pp. 151–187). Kluwer Acad. Publ.R¨uschendorf, L. (1995). Convergence of the iterative proportional fitting procedure.
The Annals ofStatistics , , 1160–1174.Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica , , 571–587.Schweizer, B., & Sklar, A. (1983). Probabilistic metric spaces . North-Holland, New York.Tiit, E.-M. (1992). Extremal multivariate distributions having given discrete marginals.
Acta etCommentationes Universitatis Tartuensis , , 94–113.Valadier, M. (1969). Sous-diff´erentiels d’une borne sup´erieure et d’une somme continue de fonctionsconvexes. C. R. Acad. Sci. Paris , , A39–A42.Villani, C. (2003). Topics in optimal transportation . American Mathematical Society.
Yaari, M. (1987). The dual theory of choice under risk.
Econometrica , , 95–115.Zolotarev, V. (1983). Probability metrics. Theory Probab. Appl. , , 278–302. EFERENCES 19
Appendix A. Facts on conic orders
In the space M I,J ( R ), a basis is a convex set B with 0 / ∈ ¯ B (the closure of B ). We assume that B is a compact basis. Let K ( B ) be the dual cone of the cone generated by B , R + .B = { λ.b, λ ≥ , b ∈ B } , which means that: K ( B ) = { Σ ∈ M I,J ( R ) | Σ · M ≥ , M ∈ R + .B } Its interior is
Int ( K ( B )) = { Σ ∈ M I,J ( R ) | Σ · M > , M ∈ R + .B \{ }} It is important to note that in both definitions, R + .B and R + .B \{ } can be replaced by the basis B .A strict partial order is defined on M I,J ( R ) by setting M ≻ K M ⇔ M − M ∈ K ∗ + If S is a subset of M I,J ( R ), a maximal element of S for this order is a matrix A ∈ S such that forall B ∈ S , M − M / ∈ K ∗ + : M can not be ‘strictly dominated’ by any element in S .The choice of M I,J ( R ) is arbitrary here and it could be replaced by any euclidean space. Appendix B. Proof of the results
B.1.
Proof of Theorem 1.
Proof.
As the covariance set is a closed convex set, a point x ∈ M I,J ( R ) lies on its boundary if andonly if there exists a non-zero M ∈ M I,J ( R ) \{ } such that M · x is maximal as a function of x .This translates the fact that there exists a supporting hyperplane at x . Thus σ π is on the boundaryof the covariance set iff there exists M ∈ M I,J ( R ) \{ } such that M · σ π = sup ˜ π ∈ Π( P,Q ) M · σ ˜ π (recall that M · σ π = T r ( M ′ σ π )).Equivalence between (ii) and (iii) follows from a well-known result in Optimal Transport theory,the Knott-Smith optimality criterion (see Villani, 2003, Th. 2.12). (cid:3) (cid:3) B.2.
Proof of Theorem 2.
Before we give the proof of the theorem, we state and prove a numberof auxiliary results which are of interest per se. Let B be a compact basis ; we have a crucial,although technical, variational characterization of the maximality of σ π with respect to ≻ K ( B ) : Proposition 1. (Variational characterization of maximality) σ π maximal iff inf M ∈ B sup ˜ π ∈ Π( P,Q ) ( σ ˜ π − σ π ) · M = 0In other words, a coupling is maximal whenever there exists M ∈ B such that σ π maximizes σ ˜ π · M . Proof of proposition 1.
Note that for every π ∈ Π( P, Q ), the function f : (˜ π, M ) ∈ Π( P, Q ) × B (Σ ˜ π − Σ π ) · M exhibits a saddlepoint (¯ π, ¯ S ):max ˜ π ∈ Π( P,Q ) min M ∈ B f (˜ π, M ) = f (¯ π, ¯ M ) = min M ∈ B max ˜ π ∈ Π( P,Q ) f (˜ π, M ) (10)This is a consequence of a classical minmax theorem by Fan, 1951: a continuous function over aproduct of compact convex sets embedded in normed linear spaces, which is linear in both argumentsexhibits a saddlepoint. Both Π( P, Q ) and B are compact and convex. The compactness of B is anhypothesis, and it is a well-known fact that Π( P, Q ) is compact, see Villani, 2003. Moreover f islinear in M and ˜ π , and continuous in both arguments. Finally, Π( P, Q ) can be embedded in thespace of Radon measures over R I × R J endowed with the bounded Lipschitz norm. We refer to Villani op. cit. chapter 7. for more details on this point: the important thing is that Π(
P, Q ) is acompact subset (for this norm) of this space.Back to the proof of the theorem. If σ π is maximal, then for all σ ˜ π one has σ ˜ π − σ π / ∈ Int ( K ( B )),which means that for some M ∈ B , ( σ ˜ π − σ π ) · M ≤
0, hencesup ˜ π ∈ Π( P,Q ) inf M ∈ B ( σ ˜ π − σ π ) · M ≤ π = π . Thanks tothe compactness of B and Π( P, Q ), the minmax theorem applies and yield that the infimum of thesupremum is zero.On the contrary, if σ π is not maximal then there exists some coupling ˜ π such that σ ˜ π − σ π ∈ Int ( K ( B )). Thus, for all M ∈ B , sup ˜ π ∈ Π( p,q ) σ ˜ π − σ π · M >
0, and thanks to the compactness of B ,inf M ∈ B sup ˜ π ∈ Π( p,q ) σ ˜ π − σ π · M > (cid:3) As a consequence, we are now ready to prove theorem 2.
Proof of theorem 2. ( ii ) ⇒ ( i ): If for some M ∈ B , a coupling π satisfies E π ( X · M Y ) = sup ˜ π ∈ Π( P,Q ) E ˜ π ( X · M Y )then sup ˜ π ∈ Π( P,Q ) ( σ ˜ π − σ π ) · M = 0 and so inf M ∈ B sup ˜ π ∈ Π( P,Q ) ( σ ˜ π − σ π ) · M ≤
0. But this is aninfimum of quantities that are greater than zero, and eventually the ‘inf sup’ is zero.( i ) ⇒ ( ii ): if σ π is maximal, then proposition 1 entails inf M ∈ B sup ˜ π ∈ Π( p,q ) σ ˜ π − σ π · M = 0. Dueto the compactness of B , there exists a matrix M ∈ B , such that the supremum is zero, whichconcludes the proof of this implication. (cid:3) (cid:3) Appendix C. More details on problem (9)The objective function of the problem (9) is convex in M , because it is the sum of: a linearfunction of M , − σ ˆ π · M ; and of W ( M, M as the supremum over π ∈ Π( p, q )of linear functions in M , namely E π ( X ′ M Y ) = σ π · M .Moreover, assuming that the entropy of the coupling ˆ π is finite, then W ( M, ≥ σ ˆ π · M + Ent(ˆ π ).Thus W ( M, − σ ˆ π · M ≥ Ent(ˆ π ) > −∞ . A convex function which is bounded below admits aglobal minimizer.Moreover, the objective function is differentiable as W ( M,
1) is differentiable and ∇ M W ( M,
1) = σ π ( M, . This is a consequence of a property of subdifferentials, see e.g. Valadier, 1969. Aglobal minimizer is necessarily a critical point, proving that the solution M of problem (9) sat-isfies σ π ( M, = σ ˆ π .Nevertheless, depending on the marginal distributions, this minimizer might not be unique: forinstance if P is the law of a vector ( X ,