[PDF] Simulation Methods for Robust Risk Assessment and the Distorted Mix Approach

Abstract

Uncertainty requires suitable techniques for risk assessment. Combining stochastic approximation and stochastic average approximation, we propose an efficient algorithm to compute the worst case average value at risk in the face of tail uncertainty. Dependence is modelled by the distorted mix method that flexibly assigns different copulas to different regions of multivariate distributions. We illustrate the application of our approach in the context of financial markets and cyber risk.

Full PDF

aa r X i v : . [ q -f i n . R M ] S e p Simulation Methods for Robust Risk Assessmentand the Distorted Mix Approach

Sojung Kim Stefan Weber

Leibniz Universität Hannover

September 9, 2020 ∗ Abstract

Uncertainty requires suitable techniques for risk assessment. Combining stochastic ap-proximation and stochastic average approximation, we propose an eﬃcient algorithm tocompute the worst case average value at risk in the face of tail uncertainty. Dependenceis modelled by the distorted mix method that ﬂexibly assigns diﬀerent copulas to diﬀerentregions of multivariate distributions. We illustrate the application of our approach in thecontext of ﬁnancial markets and cyber risk.

Keywords:

Uncertainty; average value at risk; distorted mix method; stochastic approximation;stochastic average approximation; ﬁnancial markets; cyber risk.

Capital requirements are an instrument to limit the downside risk of ﬁnancial companies. Theyconstitute an important part of banking and insurance regulation, for example, in the contextof Basel III, Solvency II, and the Swiss Solvency test. Their purpose is to provide a buﬀer toprotect policy holders, customers, and creditors. Within complex ﬁnancial networks, capitalrequirements also mitigate systemic risks.The quantitative assessment of the downside risk of ﬁnancial portfolios is a fundamental,but arduous task. The diﬃculty of estimating downside risk stems from the fact that extremeevents are rare; in addition, portfolio downside risk is largely governed by the tail dependence ofpositions which can hardly be estimated from data and is typically unknown. Tail dependenceis a major source of model uncertainty when assessing the downside risk.In practice, when extracting information from data, various statistical tools are applied forﬁtting both the marginals and the copulas – either (semi-)parametrically or empirically. The se-lection of a copula is frequently made upon mathematical convenience; typical examples includeArchimedean copulas, meta-elliptical copulas, extreme value copulas, or the empirical copula, seee.g. McNeil, Frey & Embrechts (2015). The statistical analysis and veriﬁcation is based on theavailable data and is center-focused due to limited observations from tail events. This approachis necessarily associated with substantial uncertainty. The induced model risk thus aﬀects thecomputation of monetary risk measures, the mathematical basis of capital requirements. Thesefunctionals are highly sensitive to tail events by their nature – leading to substantial misspeciﬁ-cation errors of unknown size. ∗ Institut für Mathematische Stochastik & House of Insurance, Leibniz Universität Han-nover, Welfengarten 1, 30167 Hannover, Germany. e-mail: [email protected] , [email protected] .

1n this paper, we suggest a novel approach to deal with this problem. We focus on thedownside risk of portfolios. Realistically, we assume that the marginal distributions of individualpositions and their copula in the central area can be estimated suﬃciently well. We suppose,however, that a satisfactory estimation of the dependence structure in the tail area is infeasible.Instead, we assume that practitioners who deal with the estimation problem share viewpointson a collection of copulas that potentially capture extremal dependence. However, practitionersare uncertain about the appropriate choice among the available candidates.The family of copulas that describes tail dependence translates into a family of joint distribu-tions of all positions and thus a collection of portfolio distributions. To combine the ingredientsto joint distributions, we take a particularly elegant approach:The Distorted Mix (DM) method developed by Li, Yuen & Yang (2014) constructs a familyof joint distributions from the marginal distributions, the copula in the central area and severalcandidate tail copulas. A DM copula is capable of handling the dependence in the center and inthe tail separately. We use the DM method as the starting point for a construction of a convexfamily of copulas and a corresponding set of joint distributions.Once a family of joint distributions of the positions is given, downside risk in the face ofuncertainty can be computed employing a classical worst case approach. To quantify downsiderisk, we focus on robust average value at risk (AV@R). The risk measure AV@R is the basisfor the computation of capital requirements in both the Swiss solvency test and Basel III. Asrevealed by the axiomatic theory of risk measures, AV@R has many desirable properties such ascoherence and sensitivity to tail events, see Föllmer & Schied (2004). In addition, AV@R is m-concave on the level of distributions, see Bellini & Bignozzi (2015), and admits the application ofwell-known optimization techniques as described in Rockafellar & Uryasev (2000) and Rockafellar& Uryasev (2002). Our model setup leads to a continuous stochastic optimization problem towhich we apply a combination of stochastic approximation and sample average approximation.We explain how these techniques may be used to reduce the dimension of the mixture space ofcopulas. We discuss the solution technique in detail and illustrate its applicability in severalexamples.The main contributions of the paper are:I. For a given family of copulas modelling tail dependence, we describe a DM framework thatconveniently allows worst case risk assessment.II. We provide an eﬃcient algorithm that a) numerically computes the worst case risk and b)identiﬁes worst case copulas in a lower-dimensional mixture space.III. We apply our framework to ﬁnancial markets and cyber risk:(a) Our results indicate that tail risk can be captured very well by the DM method, if asuﬃcient amount of tail data is available.(b) If only few data are available and uncertainty about tail dependence is high, thesuggested algorithm eﬃciently characterizes the worst case within the chosen class ofcopulas.The paper is structured as follows. Section 2 explains the DM approach to model uncer-tainty and formulates the optimization problem associated to the computation of robust [email protected] Section 3, we develop an optimization solver combining stochastic approximation (i.e., theprojected stochastic gradient method) and sample average approximation: stochastic approxi-mation identiﬁes candidate copulas and a good approximation of the worst-case risk; in manycases, risk is insensitive to certain directions in the mixture space of copulas, enabling us to usesample average approximation to identify worst-case solutions in lower dimensions. Section 4discusses two applications of our framework, namely to ﬁnancial markets and cyber risk. Section5 concludes with a discussion of potential future research directions.2 iterature

The concept of model uncertainty or robustness is a challenging topic in practice that has alsobeen intensively discussed in the academic literature, see e.g. Cont, Deguest & Scandolo (2010),Hu & Hong (2013), Glasserman & Xu (2014), Krätschmer, Schied & Zähle (2014), Breuer &Csiszár (2016), Blanchet & Murthy (2019), and Bartl, Drapeau & Tangpi (2020). In the currentpaper, we focus on worst-case AV@R in a multi-factor model. This is closely related to papersthat derive bounds with partial information, cf. Embrechts, Puccetti & Rüschendorf (2013),Bernard, Jiang & Wang (2014), Bernard & Vanduﬀel (2015), Rüschendorf (2017), Puccetti,Rüschendorf, Small & Vanduﬀel (2017), Li, H. Shao & Yang (2018), Embrechts, Liu & Wang(2018), Weber (2018), and Hamm, Knispel & Weber (2020). In contrast to these contribution,we propose an algorithmic DM approach that is based on candidate copulas which is very ﬂexiblein terms of the marginal distributions and copulas that are considered. This is closely related tothe simpler setting of mixtures as studied in Zhu & Fukushima (2009) and Kakouris & Rustem(2014).Our algorithm builds on sampling-based stochastic optimization techniques. Applications ofstochastic approximation and stochastic average approximation to the evaluation of risk measureswere investigated by Rockafellar & Uryasev (2000), Rockafellar & Uryasev (2002), Dunkel &Weber (2007), Bardou, Frikha & Pagès (2009), Dunkel & Weber (2010), Meng, Sun & Goh(2010), Sun, Xu & Wang (2014), Bardou, Frikha & Pagès (2016), and Ghosh & Lam (2019). Thetechniques are also discussed in Kushner & Yin (2003), Shapiro (2003), Fu (2006), Bhatnagar,Prasad & Prashanth (2013), and Kim, Pasupathy & Henderson (2015).

Letting (Ω , F , P ) be an atomless probability space, we consider the family of random variables X = L (Ω , F , P ) . The task consists in computing the risk ρ ( X ) of an aggregate loss randomvariable X ∈ X for a risk measure ρ . A ﬁnite distribution-based monetary risk measure ρ : X → R is a functional with the following three properties: • Monotonicity : X ≤ Y ⇒ ρ ( X ) ≤ ρ ( Y ) ∀ X, Y ∈ X•

Cash-invariance : ρ ( X + m ) = ρ ( X ) + m ∀ X ∈ X , m ∈ R • Distribution-invariance : P ◦ X − = P ◦ Y − ⇒ ρ ( X ) = ρ ( Y ) ∀ X, Y ∈ X

We consider a speciﬁc factor structure of aggregate losses. We assume that X = Ψ( X , · · · , X d ) ∈ X where X = ( X , · · · , X d ) is a d -dimensional random vector and Ψ : R d → R is some measurablefunction. The individual components X i may depict diﬀerent business lines, risk factors, or sub-portfolios, and the function Ψ : R d → R summarizes the quantity of interests. Frequently usedaggregations are the total loss X = P di =1 X i and the excess of loss treaty X = P di =1 ( X i − k i ) + for thresholds k i ∈ R + .Computing the risk measure ρ ( X ) requires a complete model of the random vector X =( X , · · · , X d ) . Let F ( x , · · · , x d ) be its unknown d-dimensional joint distribution which we aimto understand. By Sklar’s theorem, any multivariate distribution F can be written as the com-position of a copula C and the marginal distributions F i of its components: F ( x , · · · , x d ) = C ( F ( x ) , · · · , F d ( x d )) . The typical situation in practice is as follows: 3

The marginals F ( x ) , · · · , F d ( x d ) and the dependence structure in the central area, de-noted by the copula C , can be estimated from available data. Typical examples of C mayinclude the Gaussian copula, the t-copula, or the empirical copula. • However, due to limited observations in the tail, the copula C might not capture thecharacteristics of the extreme area very well. Instead, in the face of tail uncertainty, extremedependence should be captured by a collection of copulas instead of a single copula. Thiswill be explained in Section 2.2.Before we describe our approach to model uncertainty in the next section, we introduce animportant tool for combining diﬀerent copulas in order to to handle the central and tail partsseparately, the Distorted Mix (DM) method , see Li et al. (2014). A DM copula C is constructedfrom m + 1 component copulas: C for the typical area, and C , · · · , C m for the extreme area. Deﬁnition 1 (Distorted mix copula)

Let D ij : [0 , → [0 , be continuous distortion functions, i.e. continuous, increasing functionswith D ij (0) = 0 , D ij (1) = 1 , and α i ≥ , i = 0 , · · · , m , j = 1 , · · · , d , such that m X i =0 α i = 1 , m X i =0 α i D ij ( v ) = v ∀ v ∈ [0 , , j = 1 , · · · , d. (1) For any collection of copulas C , · · · , C m , the corresponding distorted mix copula C : [0 , d → [0 , is deﬁned by C ( u , · · · , u d ) = m X i =0 α i C i ( D i ( u ) , · · · , D id ( u d )) . (2) Remark 1

A copula captures the dependence structure of a multivariate random vector with marginal distri-bution functions F , F , . . . , F d as a function of u = F ( x ) , u = F ( x ) . . . , u d = F d ( x d ) with x , x , . . . , x d ∈ R . The argument x j is a quantile of F j at level u j , j = 1 , , . . . , d : Levels closeto correspond to the lower tail of ( X , X , . . . , X d ) , levels close to 1 to the upper tail, and otherlevels to the center of the distribution.In equation (2) , for i = 0 , . . . , m , the parameter α i deﬁnes the probability fraction of thetotal dependence that is governed by copula C i which is distorted by the distortion functions D i , D i , . . . , D id . These distortion functions describe how the arguments (or levels) of copula C are mapped to the arguments (or levels) of the ingredient copulas C i . We illustrate these featuresin the following example. Example 1

Let d = m = 2 and α = α = α = 1 / . We suppose that C and C are the comonotoniccopulas, i.e. C ( u , u ) = C ( u , u ) = min( u , u ) , and that C is the countermonotonic copula,i.e. C ( u , u ) = max( u + u − , . We set D ij ( u j ) = max { · ( u j − a i ) , }∧ , a = 0 , a = 2 / , a = 1 / , j = 1 , . Obviously, the lower and upper tails are governed by the comonotonic copulas C and C , respectively, and the central part is countermonotonic according to C . In thisparticular example, the dependence structure in each part is exclusively controlled by one of thecopulas C , C , and C . In this section, we explain our approach to risk assessment in the face of tail uncertainty. Asdescribed in the previous section, we assume that the marginals of the random vector X =( X , X , . . . , X d ) are given. Its copula is unknown, but possesses the following DM structure: • Let D = { D ik : i = 0 , · · · , m, k = 1 , · · · , d } be a collection of distortion functions and α = ( α , · · · , α m ) ∈ [0 , m +1 satisfying assumption (1).4 In addition, we ﬁx a copula C and a set ˜ C of copulas.We assume that the copula of X = ( X , X , . . . , X d ) belongs to the following family: DM ˜ C = ( α C ( D ( u ) , · · · , D d ( u d )) + m X i =1 α i ˜ C i ( D i ( u ) , · · · , D id ( u d )) , ˜ C i ∈ ˜ C ∀ i = 1 , · · · , m ) The worst-case risk assessment over all feasible distributions of X = ( X , X , . . . , X d ) is equalto max C ∈DM ˜ C ρ ( X ) (3)where X = Ψ( X , · · · , X d ) and X = ( X , · · · , X d ) has a copula C ∈ DM ˜ C with the givenmarginals. Remark 2

Our approach assumes that the marginals of X and the copula C for the central area are known.However, tail dependence is uncertain and captured by some family of ˜ C of copulas. The keystructural assumption is that X possesses a DM copula and that the distortions and associatedprobability fractions are ﬁxed. These determine the composition of the copula of X . The distor-tions and probability fraction associated to the copula C of the central area cannot be varied; forall other distortions and associated probability fractions the corresponding copulas may ﬂexiblybe chosen from the collection ˜ C . Remark 3

One possible approach would be to choose ˜ C as a ﬁnite collection of K ≥ m candidate copulas.In this case, the number of the DM copulas is either (cid:0) Km (cid:1) × m ! or K m if we allow duplicatecomponents. This approach has two disadvantages:First, from a technical point of view the corresponding discrete optimization problem involvesa very high number of permutations. Computing the value function for each of them is expensive,and the Ranking and Selection (R & S) method would not be eﬃcient in this case. Second, withﬁnitely many candidate copulas also their mixtures seem to be plausible ingredients to the DMmethod and should not be excluded a priori.

For a given collection C = { C , C , . . . , C K } of K candidate copulas, we consider the family ˜ C of their mixtures. That is, any element of ˜ C can be expressed as a convex combination ofelements of C : ˜ C γ = K X j =1 γ j C j , γ ∈ △ K − =  γ =  γ γ ... γ K  ∈ R K (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K X j =1 γ j = 1 and γ j ≥ for all j  , where △ K − is the standard K − simplex. The K vertices of the simplex are the points e i ∈ R K ,where e = (1 , , · · · , ⊤ , e = (0 , , · · · , ⊤ , . . . , e K = (0 , , · · · , ⊤ . With this notation, our K candidate copulas C j ∈ C can be written as C j = ˜ C e j .Any element in DM ˜ C can now be represented by some ¯ γ = ( γ , · · · , γ m ) ∈ R K × m with γ , · · · , γ m ∈ △ K − according to the following formula: ˆ C γ , ··· , γ m ( u , · · · , u d ) = α C ( D ( u ) , · · · , D d ( u d )) + m X i =1 α i ˜ C γ i ( D i ( u ) , · · · , D id ( u d )) . (4)With this notation, the optimization problem (3) can be rewritten as max ¯ γ =( γ , ··· , γ m ) ∈ ( △ K − ) m ρ (cid:0) X ¯ γ (cid:1) (5)5here X ¯ γ represents the aggregate loss Ψ( X , · · · , X d ) with ( X , · · · , X d ) having copula ˆ C γ , ··· , γ m and the given marginals. We call ˆ C γ , ··· , γ m in (4) a robust DM copula if it attains the optimalsolution of (5). Optimization problem (5) enables us to search the solution inside of the multiplesimplexes and paves a way to utilize the gradient approach.We will now construct and explore a sampling-based optimization solver. For this purpose,we focus on one particular risk measure, the average value at risk (AV@R), also called conditionalvalue at risk or expected shortfall. This risk measure forms the basis of Basel III and the SwissSolvency test. If p ∈ (0 , is the level of the AV@R, a number close to 1, the correspondingAV@R of the losses X ¯ γ is deﬁned as c p (¯ γ ) = 11 − p Z p q − t ( F X ¯ γ ) dt where q − t ( F ) = inf { x ∈ R | F ( x ) ≥ t } for a distribution F and F X ¯ γ stands for the distributionof X ¯ γ . Accordingly, we denote VaR by v p (¯ γ ) = q − p ( F X ¯ γ ) .With this notation, our optimization problem is max γ , ··· , γ m ∈△ K − c p (cid:0) γ , · · · , γ m (cid:1) = max ¯ γ ∈ ( △ K − ) m c p ( ¯ γ ) . (6) The factor structure of DM copulas provides the basis for adequate simulation methods (seeProposition 1, Li et al. (2014)). Samples of the copula ˆ C γ , ··· , γ m deﬁned in eq. (4) can be generated according to the following Algorithm 1. Algorithm 1

Sampling algorithm of the DM copula (4) generated by ¯ γ procedure RobustDMC ( α , ¯ γ , C , D ) sample a random variable Z distributing discretely as P ( Z = i ) = α i for i = 0 , · · · , m if Z = 0 then sample a random variable Z distributing discretely as P ( Z = j | Z ) = γ Z j for j = 1 , · · · , K else set Z = Z = 0 sample a random vector V = ( V , · · · , V d ) from the joint distribution C Z for k = 1 to d do U k = D − Z k ( V k ) return U = ( U , · · · , U d ) Samples of ( X , · · · , X d ) with copula ˆ C γ , ··· , γ m and arbitrary marginal distributions F , F , . . . , F d can be generatedaccording to the quantile transformation ( X , · · · , X d ) = d (cid:0) F − ( U ) , · · · , F − d ( U d ) (cid:1) . .3.2 Aggregate Loss The simulation of the aggregrate losses X ¯ γ is now based on a simple transformation. Setting A ( s ) = (cid:8) ( u , · · · , u d ) : Ψ (cid:0) F − ( u ) , · · · , F − d ( u d ) (cid:1) ≤ s (cid:9) , we deﬁne distribution functions G ( s ) = Z A ( s ) dC ( D ( u ) , · · · , D d ( u d )) ,G ij ( s ) = Z A ( s ) dC j ( D i ( u ) , · · · , D id ( u d )) , i = 1 , · · · , m, j = 1 , · · · , K, and note that F X ¯ γ ( s ) := P [ X ¯ γ ≤ s ] = α G ( s ) + m X i =1 α i K X j =1 γ ij G ij ( s ) . (7)If Ψ and Ψ ij are distributed according to G and G ij , i = 1 , · · · , m, j = 1 , · · · , K , andindependent of Z and Z deﬁned in Algorithm 1, then X ¯ γ = d [ Z =0] Ψ + m X i =1 K X j =1 [ Z = i,Z = j ] Ψ ij . This representation will be instrumental for our simulation algorithms. For later use, we denotethe density functions of G ( s ) , G ij ( s ) , and F X ¯ γ ( s ) by g ( s ) , g ij ( s ) , and f X ¯ γ ( s ) , i = 1 , · · · , m, j =1 , · · · , K , respectively, provided that they exist. In this section, we develop an algorithm solving problem (6) that builds on two classical ap-proaches:

Stochastic Approximation (SA) and

Sample Average Approximation (SAA) . While SAis an iterative optimization algorithm that is based on noisy observations, SAA ﬁrst estimatesthe whole objective function and transforms the optimization into a deterministic problem. Wecombine both approaches.The standard stochastic gradient algorithm of SA quickly approximates the worst-case risk,but the convergence to a worst-case copula is slow. It turns out that in many cases the risk isinsensitive to certain directions in the mixture space of copulas. We exploit this observation inorder to reduce the dimension of the problem and identify a suitable subset of C that excludescopulas whose contribution to the worst-case risk is small. We then determine a solution in thecorresponding simplex, relying on SAA, which is computationally eﬃcient in lower dimensionsonly, but provides a good global solution to optimization problems, even if stochastic gradientalgorithms are noisy and slow.Our method thus ﬁrst applies SA to estimate worst-case risk together with a candidatemixture from which a lower-dimensional problem is constructed. Second, SAA is used, but onlyin the lower-dimensional mixture space – utilizing a large sample set that reduces noise. Step 1 – Sampling.

We generate N independent copies of the m × K + 1 random variables Ψ and Ψ ij according to the distribution functions G and G ij , i = 1 , · · · , m, j = 1 , · · · , K ,respectively. Step 2 – SA Algorithm.

The

PSG-RobustAV@R

Algorithm 2 discussed in Section 3.1 seeks acandidate solution and terminates after a small number of iterations. We design a stopping rulethat determines when to move to the next step.7 tep 3 – SAA Algorithm.

From the solution obtained in Step 2 we construct a lower-dimensional simplex in which we search for a solution. We apply SAA on a suitable grid. The

SAA-RobustAV@R

Algorithm 4 is discussed in Section 3.2.

SA is a recursive procedure evaluating noisy observations of the objective function and its sub-gradient. The algorithm moves in the gradient direction approaching a local optimum by aﬁrst-order approach (minimization and maximization require, of course, opposite signs).

Algorithm 2

The projected stochastic gradient algorithm for the robust AV@R procedure PSG-RobustAV@R Input the level p of AV@R, the step sizes { δ t = t − a } t ≥ , the sample size sequences { N t } t ≥ , the number of iterations M , the PDF of X ¯ γ t at iteration t denoted by f X ¯ γ t ( s ) , andthe PDFs g ( s ) and g ij ( s ) for i = 1 , · · · , m and j = 1 , · · · , K Initialization: Set a starting state ¯ γ = (cid:0) γ , · · · , γ m (cid:1) with γ i ∈ △ K − , i = 1 , · · · , m while terminal conditions are not met do for t = 1 to M do Simulation: ⊲ generate ( L , · · · , L N ) N i.i.d. observations of X ¯ γ t for l = 1 to N t do Sample ( U l , · · · , U ld ) from Algorithm 1 with ¯ γ t = (cid:0) γ t , · · · , γ mt (cid:1) Set L l = Ψ (cid:0) F − ( U l ) , · · · , F − d ( U ld ) (cid:1) VaR and AV@R Estimation:

Set ˆ v N t p = L ⌈ N t p ⌉ : N t Set ˆ c N t p = ˆ v N t p + N t (1 − p ) P N t i =1 ( L i − ˆ v N t p ) + AV@R Gradient Estimation:

Set f X ¯ γ t ( s ) = α g ( s ) + P mi =1 α i P Kj =1 γ ij,t g ij ( s ) ⊲ γ ij,t is the j-th component of γ it for i = 1 to m , and j = 1 to K do Set ∆ i,j (ˆ c N t p ) = N t (1 − p ) P N t l =1 α i g ij ( L l ) f X ¯ γ t ( L l ) (cid:0) L l − ˆ v N t p (cid:1) h L l ≥ ˆ v Ntp i Parameter Update - Multiple Simplexes Projection for i = 1 to m do Set ∆ it = (∆ i, , · · · , ∆ i,K ) Update γ it +1 = Π △ K − (cid:0) γ it + δ t ∆ it (cid:1) by Algorithm 3 Output ˆ c Np and γ t , · · · , γ mt Algorithm 3

Euclidean projection of a vector y onto simplex procedure ProjS ( y ) ⊲ y ∈ R K sort y into u : u ≥ u ≥ · · · u K ﬁnd τ = max { ≤ j ≤ K : u j + j (1 − P jk =1 u k ) > } deﬁne λ = τ (1 − P τk =1 u k ) return x s.t. x i = max( y i + λ, , i = 1 , · · · , K ⊲ x = Π △ K − ( y ) ∈ R K Algorithm 2 seeks to solve the optimization problem (6). At each iteration t the SA algorithmﬁrst generates loss samples L , · · · , L N t of X ¯ γ t according to Algorithm 1. SA then estimates the8@R and AV@R as follows: ˆ v N t p = L ⌈ N t p ⌉ : N t , ˆ c N t p = ˆ v N t p + 1 N t (1 − p ) N t X i =1 ( L i − ˆ v N t p ) + . Here, ⌈ a ⌉ denotes the smallest integer larger than or equal to a , and L s : N is the s-th orderstatistic from the N observations, L N ≤ L N ≤ · · · ≤ L N : N .Second, SA computes the gradients ∆ it = (∆ i, , · · · , ∆ i,K ) of c p at γ it from ∆ i,j (ˆ c Np ) = 1 N (1 − p ) N X l =1 ∂ log f X ¯ γ ( L l ) ∂γ ij (cid:0) L l − ˆ v Np (cid:1) [ L l ≥ ˆ v Np ]= 1 N (1 − p ) N X l =1 α i g ij ( L l ) f X ¯ γ ( L l ) (cid:0) L l − ˆ v Np (cid:1) [ L l ≥ ˆ v Np ] (8)for every i = 1 , · · · , m .Third, parameter updates are computed for each i : γ it +1 = Π △ K − (cid:0) γ it + δ t ∆ it (cid:1) , (9)where Π △ K − ( x ) = arg min y {|| x − y || | y ∈ △ K − } is the Euclidean projection of x onto thesimplex, and { δ t } t ≥ is the step size multiplier. This type of algorithm is called the projectedgradient descent algorithm .Algorithm 2, a projection onto multiple high dimensional simplexes, applies methods de-scribed in Condat (2016). In contrast to these, the simple, classical projection Algorithm 3 thatwe included for illustration possesses the larger complexity O ( K ) . The convergence of Algorithm 2 is guaranteed by Assumptions 1 & 2 below, see Theorem 5.2.1in Kushner & Yin (2003).

Assumption 1 (1) The random variable X ¯ γ is a continuous random variable with a ﬁnitesupport for all γ ij , i = 1 , · · · , m and j = 1 , · · · , K .(2) For all γ ij , the gradients ∂∂γ ij v p (¯ γ ) and ∂∂γ ij c p (¯ γ ) are well deﬁned and bounded.(3) X ¯ γ has a positive and continuously diﬀerentiable density f X ¯ γ , and ∂∂γ ij log f X ¯ γ ( s ) existsand is bounded for all s and γ ij . Assumption 2 (1) The step size sequence { δ t } t ≥ satisﬁes ∞ X t =1 δ t = ∞ , δ t ≥ , ∞ X t =1 δ t < ∞ . (2) ∂∂γ ij c p (¯ γ ) is continuous, and ∞ X t =1 δ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) ∆ i,j (cid:0) ˆ c N t p (cid:1)(cid:3) − ∂∂γ ij c p (¯ γ t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ with probability 1 for each i and j .

9n speciﬁc applications, these suﬃcient conditions for convergence are not always satisﬁed.However, the SA algorithm might still produce estimates that approach a solution of the problem.We will impose a switching condition that determines when we move from SA to SAA. SAA isexplained in the next section.

Remark 4 (Concavity of AV@R)

Algorithm 2 converges to a local maximum. But any local maximum of the problem (6) is eventhe global maximum, since c p is a concave function of ¯ γ , see Acciaio & Svindland (2013). Thisproperty is closely related to the m-concavity of AV@R, a concavity property on the level ofdistributions, see Weber (2006) and Bellini & Bignozzi (2015). Remark 5 (Diﬀerentiability of AV@R)

The gradient estimate (8) of AV@R in Algorithm 2 is a Likelihood Ratio (LR) method due toTamar, Glassner & Mannor (2015). A LR approach is appropriate, since the distribution of theargument X ¯ γ of the AV@R depends on ¯ γ .(a) The computation (8) needs g ij as inputs. If their computation is not analytically tractable,an empirical estimator can be chosen. Other options are AEP (Arbenz, Embrechts & Puc-cetti (2011)) and GAEP (Arbenz, Embrechts & Puccetti (2012)).(b) An alternative to LR gradient estimation are ﬁnite diﬀerences, as applied in algorithmsof Kiefer–Wolfowitz type (Kiefer & Wolfowitz (1952)). Properties of such algorithms arediscussed in Broadie, Cicek & Zeevi (2011). Finite diﬀerences require less regularity inorder to be applicable, but typically exhibit a worse performance. AV@R belongs to the class of divergence risk measures that coincide, up to a sign change, with op-timized certainty equivalents. These admit a representation as the solution of a one-dimensionaloptimization problem, see Ben-Tal & Teboulle (2007). The minimizer can be characterized by aﬁrst order condition. For the speciﬁc case of AV@R this representation was previously describedin Pﬂug (2000), Rockafellar & Uryasev (2000), and Rockafellar & Uryasev (2002), and impliesthe following identity: c p (¯ γ ) = min u ∈ R (cid:26) u + 11 − p Z ( L − u ) + dF X ¯ γ ( L ) (cid:27) . The mixture representation (7) of the distribution function of X ¯ γ provides a reformulation ofthe original problem (6): max ¯ γ ∈ ( △ K − ) m min u ∈ R  u + α − p E [Ψ − u ] + + m X i =1 K X j =1 α i γ ij − p E [Ψ ij − u ] +  (10)where Ψ and Ψ ij are random variables with distributions G and G ij , respectively.SAA algorithmically solves the stochastic optimization problem (6) by ﬁrst approximatingthe objective function by its sample average estimate and then solving the auxiliary deterministicproblem. Eq. (10) suggests the following SAA for (6): max ¯ γ ∈ ( △ K − ) m min u ∈ R  u + α − p N N X k =1 [Ψ k − u ] + + m X i =1 K X j =1 α i γ ij − p N N X k =1 [Ψ ijk − u ] +  (11)The SAA procedure is described in Algorithm 4.10 lgorithm 4 SAA for the robust AV@R procedure SAA-RobustAV@R Input the probability level p for AV@R, { Ψ k , Ψ ijk } k =1 , ··· ,N N realizations Ψ and Ψ ij for i = 1 , · · · , m and j = 1 , · · · , K , α , a grid of ¯ γ = ( γ , · · · , γ m ) for every ¯ γ = ( γ , · · · , γ m ) on the grid do Initialization:

Initialize the lower bound u l with ¯ p N ( u l ) < p , and the upper bound u u with ¯ p N ( u u ) > p Set p u = ¯ p N ( u u ) and p l = ¯ p N ( u l ) Bisection method: while | p u − p | > ǫ and | p l − p | > ǫ do u m = ( u u + u l ) / and evaluate p m = ¯ p N ( u m ) if p m > p then set u u = u m and p u = p m else set u l = u m , and p l = p m if | p u − p | ≤ ǫ then return ˇ u N (¯ γ ) = u u else return ˇ u N (¯ γ ) = u l AV@R computation:

Compute ˇ c Np (¯ γ ) = ˇ u N + α − p N P Nk =1 [Ψ k − ˇ u N ] + + P mi =1 P Kj =1 α i γ ij − p N P Nk =1 [Ψ ijk − ˇ u N ] + Output max ˇ c Np (¯ γ ) on the grid The inner minimization in (10) can numerically be solved on the basis of ﬁrst order conditionsthat are speciﬁed in the following lemma.

Lemma 1

Let ζ ( u, ¯ γ ) = u + α − p E [Ψ − u ] + + P mi =1 P Kj =1 α i γ ij − p E [Ψ ij − u ] + . Then ∂ + ζ∂u ( u, ¯ γ ) = − p − p + α − p P (Ψ ≤ u ) + X i,j α i γ ij − p P (Ψ ij ≤ u ) ,∂ − ζ∂u ( u, ¯ γ ) = − p − p + α − p P (Ψ < u ) + X i,j α i γ ij − p P (Ψ ij < u ) . The minima of the function u ζ ( u, ¯ γ ) are attained and any minimizer z satisﬁes ∂ − ζ∂u ( u, ¯ γ ) ≤ ≤ ∂ + ζ∂u ( u, ¯ γ ) . (12) If the distribution functions of Ψ and Ψ ij are continuous, the ﬁrst order condition (12) becomes p = α P (Ψ ≤ u ) + X i,j α i γ ij P (Ψ ij ≤ u ) . (13) Proof.

See Appendix A2.Replacing P (Ψ ≤ u ) and P (Ψ ij ≤ u ) in (12) and (13) by the empirical probabilities, weobtain a SAA approach to solve the root ﬁnding problems posed by the ﬁrst order conditions.The sample version of (13) is ¯ p N ( u ) = α N N X k =1 [Ψ k ≤ u ] + m X i =1 K X j =1 α i γ ij N [Ψ ijk ≤ u ] . Utilizing a simple bisection method, one can determine the root ˇ u N (¯ γ ) that solves ¯ p N ( u ) = p .11 .2.2 Outer maximization The sample version of the outer maximization in (10) is max ¯ γ  ˇ u N (¯ γ ) + α − p N N X k =1 [Ψ k − ˇ u N (¯ γ )] + + m X i =1 K X j =1 α i γ ij − p N N X k =1 [Ψ ijk − ˇ u N (¯ γ )] + | {z } =: ˇ c Np (¯ γ )  Algorithm 4 evaluates ˇ c Np (¯ γ ) for all ¯ γ on a grid, compares the values of the function and therebydetermines an approximate solution. The outer maximization requires the computation at many grid points and is expensive in highdimensions. We propose to identify a suitable lower-dimensional subsimplex in the space ofcopulas on the basis of SA, before we switch to SAA. This is justiﬁed by the fact that theworst-case risk is typically insensitive to contributions of some of the copulas in C . Before wesummarize the full procedure, we address the switching condition from SA to SAA.SA produces a random sequence (¯ γ t ) t =1 , ,... . We choose a certain burn-in period t min and amaximal number of SA-iterations t max to construct a stopping time t ∗ ∈ { t min , t min +1 , . . . , t max } .We stop at t when two consecutive matrices ¯ γ t − and ¯ γ t are close to each other according to somemetrics. In the examples below, we implement the 1-norm k A k = P i P j | A ij | and a thresholdlevel of . . Moreover, we choose t min = 10 and t max = 50 .When switching to SAA, the dimension of the problem is reduced as follows. To simplify thenotation, we write ¯ γ =  γ γ · · · γ m γ γ · · · γ m ... ... . . . ... γ K γ K · · · γ mK  instead of ¯ γ t ∗ where t ∗ is the stopping time described above. Recall that the index j = 1 , , . . . , K enumerates the copulas in C , while i = 1 , , . . . , m labels the weights α i and correspondingdistortions D i , D i , . . . , D id in eq. (2) or eq. (4), respectively. We assume that the weights areequal, i.e., α i = α ∀ i = 1 , , . . . , m ; this assumption ensures that the probability fraction of thetotal dependence that is governed by each column of ¯ γ is equal for all columns.We ﬁrst select the number of copulas K ∗ < K we wish to select from C for the applicationof SAA. We distinguish the cases K ∗ ≤ m and K ∗ > m . In the ﬁrst case, we identify the largestentry from ¯ γ and select the copula corresponding to it. We remove the corresponding row andthe corresponding column from ¯ γ , identify the largest entry from the remaining matrix, andremove again the corresponding row and column. We proceed iteratively until K ∗ copulas areselected. In the second case, i.e., K ∗ > m , all rows and columns are removed from ¯ γ , after m copulas were selected. In this case, we proceed with selecting copulas m + 1 , m + 2 , . . . as follows.We remove all rows corresponding to the m copulas that were already selected, and then proceedin the same manner as described above to select the remaining copulas. Remark 6

For each i = 1 , , . . . , m , the mixture copula corresponding to γ i governs a probability fraction α of the overall dependence structure in a region determined by the distortions D i , D i , . . . , D id .The algorithm consecutively selected for diﬀerent i the most important element from the copulasin C that were not previously selected. This guarantees that the contributions of the vectors of If there is a tie, we select the one with the larger gradient. istortion functions corresponding to diﬀerent values of i are taken into consideration when the K ∗ copulas are chosen. We ﬁnally present a brief summary of our proposed algorithm.

Step 1 – Sampling.

1. Generate N independent copies of the random variables Ψ , Ψ ij as described in Section 2.3.2. Use the samples to estimate the densities g ( s ) , g ij ( s ) , and f X ¯ γ ( s ) . The values are storedfor the inter- and extrapolation.3. If necessary, generate new samples according to an importance sampling density h . Step 2 – SA Algorithm.

4. Apply

PSG-RobustAV@R

Algorithm 2. Importance sampling techniques can be adopted asillustrated in Section 3.4 below – if applicable.5. If the switching condition described in Section 3.2.3 is met, terminate the algorithm anddetermine a selection of the most important copulas in order to reduce the dimensionalityof the problem.

Step 3 – SAA Algorithm.

6. Construct a suitable grid on the lower-dimensional simplex. Adaptively reﬁne the grid ina smaller domain on the basis of the results of the application of the algorithm speciﬁed inthe next step, and apply the algorithm again on the new grid.7. Apply

SAA-RobustAV@R

Algorithm 4 to ﬁnd the worst case over grid points. The worstcase is the estimated solution to the original problem (6) on the lower-dimensional mixturespace of copulas chosen in Task 5.

Before we discuss applications to ﬁnance markets and cyber risk, we illustrate our procedure inthe context of a simple example motivated by Li et al. (2014).

Example 2 ( m = 2 , K = 5 , d = 2 ) Consider aggregate losses X = X + X with individual losses X , X ∈ L . The distributions ofthe individual positions are inverse Gaussian with density x r λ πx exp (cid:18) − λ µ x ( x − µ ) (cid:19) . The dependence of the positions is uncertain, and we would like to evaluate the worst-case AV@Rat level p ∈ (0 , . Letting α = 1 − α and α = α = α with α = 0 . , we assume that D i = D i = · · · = D id for all i = 0 , , and choose the distortion functions D ( x ) = x − αx α + (1 − α ) x , D ( x ) = αx α + (1 − α )(1 − x ) , D ( x ) = x − αD ( x ) − αD ( x )1 − α . (14)13 he copula capturing dependence in the typical area is modeled by a d-dimensional Gaussiancopula C = C Ga Σ = Φ , Σ (Φ − ( u ) , · · · , Φ − ( u d )); here, Φ and Φ , Σ signify the standard univariate normal distribution function and the multivariateGaussian distribution function with covariance matrix Σ , respectively.The family of copulas C = { C , C , C , C , C } is speciﬁed as follows: C : a t-copula C tν,P (cid:0) u , · · · , u d ) = t ν,P ( t − ν ( u ) , · · · , t − ν ( u d ) (cid:1) where t ν is a standard univariatet distribution with ν degree of freedom and t ν,P is the joint distribution with a correlationmatrix P ; C : a Clayton copula C Cl ( u , · · · , u d ) = (cid:16)P di =1 u − θi − d + 1 (cid:17) − /θ , < θ < ∞ ; C : a Gumbel copula C Gu ( u , · · · , u d ) = exp (cid:26) − hP di =1 ( − log u i ) θ i /θ (cid:27) , ≤ θ < ∞ ; C : a Frank’s copula C F r ( u , · · · , u d ) = log θ n Q di =1 ( θ ui − θ − d − o , θ ≥ ; C : the independence copula Π( u , · · · , u d ) = Q di =1 u i . SA Algorithm

Step 2 in the full procedure summarized in Section 3.3 is the SA Algorithm 2. Its step size isgiven by t − a for . < a ≤ . Figure 1 illustrates the dynamics of the corresponding weights inthe random sequence (¯ γ t ) t for the ﬁve copulas in C and the distortions D and D for a speciﬁcnumerical example. We vary the step size and compare a = 0 . , . , . , . for the ﬁrst 200iterations. The approximation becomes faster for smaller a .The downside risk measures by AV@R is mainly governed by the upper tail of the losseswhose dependence structure is encoded by the distortion function D . This is captured by thesecond column in Figure 1 which shows that the weights of copulas C (Clayton copula), C (Frank’s copula), and C (independence copula) decrease quickly to zero. The maximal AV@Ris mainly determined by γ (the weight of t-copula C for the upper tail, D ) and γ (the weight ofGumbel copula C for the upper tail, D ). These observations suggest that dimension reductionas described in Section 3.2.3 can successfully be implemented for this example.The initial AV@R at p = 0 . for uniform ¯ γ is reported as . for a = 0 . , while AV@Ris increased to . just after ﬁve iterations. In fact, this number is hardly distinguishablefrom the estimated optimal value found in SAA later on. We observe that AV@R values be-come insensitive to changes in ¯ γ t after just a few iterations. This observation provides furthermotivation for the suggested approach to reduce the dimension of the problem (see Section 3.2.3). Importance sampling

We explore the potential to reduce the variance of the estimators by an application of importancesampling. Recall the notation introduced in Section 2.3.2. If h is a density that dominates f X ¯ γ , we may sample from h and modify Algorithm 2 to obtain importance sampling estimatorsreplacing (i) VaR ˆ v Np , (ii) AV@R ˆ c Np , and (iii) the AV@R gradient ∆ i,j (ˆ c Np ) .Letting L = f X ¯ γ h be the likelihood ratio, we estimate the corresponding IS empirical distri-bution ˜ F ISX ¯ γ ( s ) by ˜ F ISX ¯ γ ( s ) = 1 N N X l =1 L ( y l ) [ y l ≤ s ] , s ∈ R , All entries of the matrix are equal.

20 40 60 80 100 120 140 160 180 20000.51

20 40 60 80 100 120 140 160 180 2000.20.2050.21

20 40 60 80 100 120 140 160 180 20000.51

20 40 60 80 100 120 140 160 180 2000.20.2050.21

20 40 60 80 100 120 140 160 180 20000.51

20 40 60 80 100 120 140 160 180 2000.20.2050.21

20 40 60 80 100 120 140 160 180 20000.51

20 40 60 80 100 120 140 160 180 200 iteration a = 0.6 a = 0.7 a = 0.8 a = 0.9

20 40 60 80 100 120 140 160 180 200 iteration Figure 1: SA-results for varying a ∈ (0 . , with step sizes { δ t } = { t − a } t ≥ in Example 2. The oﬀ-diagonal elements of Σ equal . , the diagonal elements . We set ν = 1 , θ = 0 . , . , . for C , C , C , respectively. The IG parameters are µ = µ = 1 , λ = 0 . , λ = 1 . . The samplesize is ﬁxed as N t = 10 for every iteration t and a kernel density at equally spaced pointsis used based on × sample data.with y l drawn iid from h . The corresponding IS estimators are ˜ v Np = inf { s : ˜ F ISX ¯ γ ( s ) ≥ p } ; ˜ c Np = ˜ v Np + 1 N (1 − p ) N X l =1 ( y l − ˜ v Np ) + L ( y l );∆ i,j (˜ c Np ) = 1 N (1 − p ) N X l =1 α i g ij ( y l ) f X ¯ γ ( y l ) (cid:0) y l − ˆ v Np (cid:1) L ( y l ) [ y l ≥ ˜ v Np ] . Motivated by eq. (7), we propose to deﬁne the IS density h as a mixture that relies onmeasure changes of the distribution functions G , G ij with densities g , g ij , i = 1 , , . . . , m , j = 1 , , . . . , K : h ( x ) = α h ( x ) + m X i =1 α i K X j =1 γ ij h ij ( x ) . For simplicity, we modify only two ingredients:We replace the central copula C by an importance sampling copula ˜ C and the marginaldistributions F i by importance sampling distributions ˜ F i ; all other ingredients of the family ofjoint distributions of X , X , . . . , X d in Example 2, in particular the collection C , are not changed.We thus obtain the following identities: h ( x ) = ∂∂x Z ˜ A ( x ) d ˜ C ( D ( u ) , · · · , D d ( u d )); h ij ( x ) = ∂∂x Z ˜ A ( x ) dC j ( D i ( u ) , · · · , D id ( u d )) ∀ i, j ;˜ A ( x ) = n ( u , · · · , u d ) : Ψ (cid:16) ˜ F − ( u ) , · · · , ˜ F − d ( u d ) (cid:17) ≤ x o . Many other strategies to design IS distributions are, of course, possible. However, good ISmethodologies for copulas are challenging. At the same time, the total computational eﬀort15ust be estimated in order to evaluate the overall eﬃciency of competing algorithms. Theseissues constitute an interesting topic for future research.To illustrate the potential of IS, we consider Example 2. As suggested by Huang, Subramanian& Xu (2010), we shift the mean vector of the Gaussian copula C to obtain ˜ C . On the marginaldistributions, we utilize for each i = 1 , , . . . , m an Esscher measure change with parameter w i that transforms an inverse Gaussian distribution IG ( µ i , λ i ) to a shifted IG distribution ˜ F i ∼ IG (cid:18) µ i √ λ i √ λ i − µ i w i , λ i (cid:19) with w i ≤ λ i µ i .Numerical results display signiﬁcant variance reduction. For example, in a typical case studywith samples the variance of the crude MC estimator of robust AV@R is . while theimportance sampling variances are reported as . and . for the historical likelihoodestimator and the kernel estimator, respectively. We observe average variance reduction ratiosaround to across samples with the following new parameters: for exponential tilting w = 0 . (new µ IG = 1 . ), w = 0 . (new µ IG = 1 . ) and for the shifted drift for Gaussiandistribution µ G = 0 . , and µ G = 1 . Switching to SAA

We apply the methodology described in Section 3.2.3. Setting t min = 10 and t max = 50 , we runSA with uniform initial values, i.e., all entries of ¯ γ are / , and with a sample size N t = 10 forstep size a = 0 . . Recall Algorithm 2 for a description of the parameters. The initial choice of ¯ γ corresponds to an AV@R at level 0.95 of 13.8046. This result diﬀers slightly from the initialvalue reported in Figure 1 due to sampling error.The stopping time equals t ∗ = 17 with corresponding ¯ γ ⊤ = (cid:18) . . . . . . . (cid:19) and AV@R at level . of . with an empirical standard deviation of . computedfrom the last ten iterations. The increments of the sequence (¯ γ t ) t =1 , ,... are already small at thestopping time t ∗ : (¯ γ − ¯ γ ) ⊤ = (cid:18) . − . . − . − . − .

004 0 0 .

004 0 0 (cid:19) . For comparison, at iteration we obtain a corresponding ¯ γ ⊤ = (cid:18) . . . . . . . (cid:19) and AV@R at level . of . with an empirical standard deviation of . computed fromthe last ten iterations. These observations indicate that SA quickly approximates the worst-caseAV@R. However, the precision improves only very slowly afterwards. The convergence to theoptimal value of ¯ γ is slow for some components.In order to reduce the dimension of the problem according to Section 3.2.3, we set K ∗ = 2 and select for the application of SAA the copulas C (t-copula) and C (Gumbel copula) on thebasis of the estimate ¯ γ . Thus, SAA needs to be applied to a two-dimensional grid for ¯ γ ⊤ = (cid:18) γ γ γ γ (cid:19) , γ + γ = 1 , γ + γ = 1 , γ , γ , γ , γ ≥ . On the basis of SAA with · samples one observes that the worst-case risk is insensitive todependence in the lower tail. The worst-case risk is attained for a γ = 1 (upper tail) with anAV@R at level 0.95 of . . This is illustrated in Figure 2.16 Figure 2: Color map of AV@R for parameters γ and γ .In summary, in this case study SA is capable of quickly computing a reasonable estimate ofthe worst-case risk. Suitable worst-case copulas, encoded by the matrix ¯ γ , in a lower-dimensionalmixture space can be determined by a combination of SA, the copula selection method describedin Section 3.2.3, and SAA. We apply our methodology to a data set spanning the time interval 2005/01/01 to 2019/12/31that contains the daily closing values of the following stock indices:i Index1 S&P 5002 NASDAQ Composite3 Dow Jones Industrial Average4 DAX Performance Index5 EURONEXT 1006 KOSPI Composite Index7 Nikkei 225The data period includes extreme events during the 2007-2008 global ﬁnancial crisis. The 7-dimensional time series is labelled by trading days t = 1 , , . . . , and quoted in US$: Price t = (cid:0) Price ,t Price ,t Price ,t Price ,t Price ,t Price ,t Price ,t (cid:1) where Price i,t is the time t US$-price of index i , i = 1 , , . . . , , according to the table above.We consider a 7-dimensional random vector X = ( X , X , . . . , X ) that models the negative ofthe terminal US$-value of the indices over a 10-day horizon, if the initial investment into each of17he indices is US$. The corresponding time series is given by x t = (cid:18) {− } · Price i,t +10

Price i,t (cid:19) i =1 , ,..., , t = 1 , , . . . , | {z } =: D We investigate the robust AV@R at level . over a 10-day horizon of a portfolio that investsan equal dollar amount into each index. To be more speciﬁc, we consider the robust AV@R ofthe losses X = P i =1 ( X i + 1) . We apply a semi-parametric approach to the seven marginal distributions. For the central partof the distributions we linearly interpolate the empirical distribution. Less data are available inthe tail, and we ﬁt Generalized Pareto Distributions (GPD) to the data which allow a convenientextrapolation of samples.To be speciﬁc, for any i , let ( x i, ( t ) ) t =1 , ··· ,D be the ordered sample of ( x i,t ) t =1 , ··· ,D for x t =( x ,t , · · · , x ,t ) such that x i, (1) ≤ · · · ≤ x i, ( D ) for all i, and denote by p l = p u = 0 . lower andupper probabilities. We set x i,l = x i, ( t l ) , t l = ⌊ D ∗ p l ⌋ , x i,u = x i, ( t u ) , t u = ⌈ D ∗ (1 − p u ) ⌉ where ⌊·⌋ represents the greatest integer less than or equal to a given number, and ⌈·⌉ denotes the smallestinteger value bigger than or equal.The CDF of a GPD with two parameters ξ and ϑ is given by G ξ,ϑ ( x ) =  − (cid:16) ξxϑ (cid:17) − /ξ , if ξ = 01 − exp( − x/ϑ ) , if ξ = 0 . The GPD is supported on x ≥ if ξ ≥ and on ≤ x ≤ − ϑ/ξ if ξ < .We estimate the two parameters ( ξ i,l , ϑ i,l ) and ( ξ i,u , ϑ i,u ) by maximum likelihood estimationbased on the lower and upper 10% excess data ( x i,l − x i, (1) , · · · , x i,l − x i, ( t l − ) and ( x i, ( t u +1) − x i,u , · · · , x i, ( D ) − x i,u ) , respectively. The estimated parameters as well as the upper and lowerboundaries are summarized in Table 4.1.1.Index i Lower tail Upper tailboundary x i,l shape ξ i,l scale ϑ i,l boundary x i,u shape ξ i,u scale ϑ i,u [ x i.l , x i,u ] for index i is H i ( x ) =  x < x i,lk − t u − t l + x − x i, ( k ) ( t u − t l ) ( x i, ( k +1) − x i, ( k ) ) x i, ( k ) ≤ x < x i, ( k +1) x ≥ x i,u . The distribution of X i is ﬁnally modeled by F i ( x ) =  p l (1 − G ξ i,l ,ϑ i,l ( x i,l − x )) x ≤ x i,l p l + (1 − p l − p u ) H i ( x ) x i,l < x ≤ x i,u (1 − p u ) + p u G ξ i,u ,ϑ i,u ( x − x i,u ) x > x i,u . (15)18igure 3 illustrates the tails of the CDF of X , comparing the ﬁtted distribution to theempirical CDF. -1.25 -1.2 -1.15 -1.1 -1.05 X F ( x ) Empirical CDFGPD lower tailx l lower boundary -0.95 -0.9 -0.85 -0.8 -0.75 -0.7 X F ( x ) Empirical CDFGPD upper tailx u upper boundary Figure 3: The estimated CDF as well as the empirical CDF of X in the lower and upper tails. As the AV@R focuses on the upper tails, we consider the following distortion functions withparameter α = α = 0 . and α = 1 − α − α : D ( x ) = (cid:26) xα if x ≤ α x > if α D ( x ) =  if x ≤ α x − α α if α < x ≤ α + α if x > α + α D ( x ) = (cid:26) if x ≤ α + α x − α − α α if α + α < x ≤ . (16)We split the data into three parts: extreme upper tail (-4%), upper tail (4%-8%), and theremaining center and lower tail (8%-100%) according to the aggregate loss function { P i =1 ( x i,t +1) } t =1 , ··· ,D . To illustrate the procedure of partitioning the data, we draw a scatter plot of ( X , X ) , · · · , ( X , X ) in Figure 4.Dependence in the central part is modeled as the Gaussian copula whose correlation matrixconsists of the estimated linear correlations. The estimated correlation matrix based on the 92%data can be found in Section A.2. In the tail parts we consider K = 16 candidate copulas: • Copulas C , C are Gaussian, matching the linear correlation or Kendall’s tau estimatedfrom the upper (4%-8%) data, respectively. • The copulas C , C , . . . , C are t-copulas whose parameters are calibrated on the basis ofthe upper (4%-8%) data: – the multivariate meta t-copulas ( C , C , C ) = ( C tν ,P , C tν ,P , C tν ,P ) with parame-ters ( ν l , P l ) , l = 1 , , ; the superscript l indicates the estimation method explainedbelow; – the grouped t-copula ( C , C , C ) = ( C Gt ν , P , C Gt ν , P , C Gt ν , P ) that allow diﬀerent sub-sets of the random variates to have diﬀerent degrees of freedom parameters; we di-vide the indices into the three subgroups US, Europe, and Asia; the vector ν l = .0 0.2 0.4 0.6 0.8 1.0X X centerupper1upper2 X X X X X Figure 4: Scatter plot of X i against X for i = 2 , · · · , . The green triangles depict the datapoints in the extreme upper tail (-4%), the red x depict the data points in the upper tail (4%-8%)and the blue circle depict the remaining data points (8%-100%).20 ν l , ν l , ν l ) , l = 1 , , , speciﬁes the degrees of freedom for the subgroups, and thesuperscript l refers again to the estimation method.The ﬁrst method ( l = 1 ) is ML estimation. The second method ( l = 2 ) exploits theapproximated log-likelihood for the degrees of freedom parameter which increases the speedof the estimation. The third method ( l = 3 ) estimates the correlation matrix P byKendall’s tau for each pair and then estimates the scalar degree of freedom by ML giventhe ﬁxed P . This method is useful when the dimension of the data is large, because thenumerical optimization quickly becomes infeasible. The estimated correlation matrices aswell as the degrees of freedom are not identical and sometimes very diﬀerent. • The copulas C , C , . . . , C are constructed analogously to C , C , . . . , C , but based onthe extreme upper (-4%) data.The calibration results can be found in Section A.2. As a benchmark, we compute the AV@R at level 0.95 when dependence is modeled by a singleGaussian copula estimated from the entire data set. The correlation matrix is given in Section A.2and the AV@R equals . when the number of samples is .We compare the benchmark to the algorithm based on DM copulas described in Section 3.With an equal initial weight of / , a constant sample size N t = 10 and step size a = 0 . , theinitial AV@R at level 0.95 corresponds to 0.652239 and is substantially higher than the bench-mark. The DM method with copulas ﬁtted to tail data provides a much a better methodologyin assessing downside risk than single Gaussian copulas. In fact, even if a DM method combinesonly Gaussian copulas for central and tail areas, the estimation results are reasonable. For theconsidered data, results are quite insensitive to the considered copulas, as long as they are ﬁttedto diﬀerent parts of the distribution and a DM copula is used.When running SA, the stopping time t ∗ = t min equals 10 with an AV@R at level . of . and an empirical standard deviation of . computed from the last ten iterations.The corresponding weights ¯ γ ⊤ are (cid:18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (cid:19) with an increment ¯ γ − ¯ γ with components of very small modulus (roughly less than 1/1000).Now we switch to SAA. Setting K ∗ = 3 , our procedure selects for the application of SAAon the basis of the estimated ¯ γ the copulas C (grouped t-copula estimated from the upper(4%-8%) data using Kendall’s tau and ML), C (t-copula estimated from the extreme upper (-4%) data using approximate ML), and C (grouped t-copula estimated from the extreme upper(-4%) data using ML). SAA with sample size and grid size . applied to the correspondingthree-dimensional grid picks only one copula, C (t-copula estimated from the extreme upper (-4%) data using approximate ML), associated with an AV@R of . . The numerical analysisconﬁrms that the AV@R values are insensitive to ¯ γ near the solution.In summary, when computing AV@R at level 0.95, in the current example a reasonableamount of tail data is available to estimate the dependence of the factors in diﬀerent parts of thedistribution. In contrast to a single Gaussian copula, the DM method provides solutions in theconsidered family that are not very sensitive to the choice of the estimated component copulas.But instead of making ad hoc assumptions that select a speciﬁc components copula a priori, ouralgorithm demonstrates explicitly the strength of the DM method, identiﬁes and substantiatesthe insensitivity to components a posteriori and ﬁnally reduces the dimensionality of the problemin the worst-case analysis. 21 .2 Cyber risk In an application to cyber risk, we study cyber incidents in USA from Privacy Rights Clear-inghouse ( https://privacyrights.org/ ) collected from January 2005 until October 2019. Fora time window ending in 2016 the data set was also analyzed by Eling & Jung (2018). Weconsider loss records in periods of two months and rearrange the data accordingly. This reducesthe number of zero entries and admits a tractable analysis that does not separate zero entriesfrom strictly positive losses. The data records contain information on the period, the number of events in each period andthe corresponding losses. We consider ﬁve types of breaches: i Data Breaches Type (Number of zero data): description1 DISC (0): Unintended Disclosure Not Involving Hacking, Intentional Breach or Physical Loss2 HACK (0): Hacked by an Outside Party or Infected by Malware3 INSD (17): Insider - employee, contractor or customer4 PHYS (5): Physical - paper documents that are lost, discarded or stolen5 PORT (16): Portable Device - lost, discarded or stolen laptop, smartphone, memory stick, etc.The data set contains 89 two months periods. The number of dates with observations of zerolosses or no incidents is provided in parenthesis. In order to simplify the analysis, we replace allzero entries by a uniform (0 , random variable; since the severity for non-zero losses is typicallyon the order of or more, this approach does not substantially modify the data, but admits asimpliﬁed model with strictly positive marginal densities. We assume that the two months breach records can be modeled by a 5-dimensional random vector L = ( L , L , . . . , L ) . We employ a loss distribution approach to the marginal distributions, i.e., L i = N i X j =1 R ij where R ij , j = 1 , · · · , N i are iid random variables representing the severity of individual lossrecords and N i signiﬁes the random number of losses. The dependence among L is capturedby a copula which will be modelled as a DM copula. The details of selection and calibrationwill be given below. In general, one is interest in measuring the risk of some functional of L .As an illustrative example, we focus on the AV@R at level . of the sum of its components, X = P i =1 L i . Motivated by Eling & Jung (2018), we model the frequency and the severity of loss recordsseparately, choosing a lognormal distribution for the severity and a negative binomial distributionfor the number of losses in each period. We estimate the parameters of the distributions andsummarize the results in Table 2; for the negative binomial we applied MLE, for the lognormalunbiased estimates of mean and variance of the log-data.For the implementation of our algorithm, we ﬁnally generate and store samples of eachdistribution. In contrast to our simpliﬁed approach, Eling & Jung (2018) build their analysis on a methodology describedin Erhardt & Czado (2012) that expresses the joint probability function by copulas with discrete and continuousmargins. Our algorithmic approach can also be applied to their methodology. The statistical estimation is,however, more diﬃcult in this case. The description was obtained from the website https://privacyrights.org/ . r p µ σ L . Since the AV@R focuses on the upper tails, we continue to use the distortion functions in (16),again choosing α = α = 0 . and α = 1 − α − α . If AV@R at level 0.95 is computed bya corresponding DM copula, the dependence on the central and lower part (captured by D ) isvery low; this is conﬁrmed in numerical experiments. For this reason, we focus on a particularlysimple approach and use a Gaussian copula for this part with linear correlation estimated fromthe data. The correlation matrix Σ is given in Section A.3 in the appendix.For the upper tails (captured by D and D ), we consider K = 8 candidate copulas, namelytwo Gaussian copulas, two t- copulas, two Gumbel copulas, and two vine copulas; for the latterwe refer to Dißmann, Brechmann, Czado & Kurowicka (2013) for further information. Morespeciﬁcally, the copulas are estimated as follows; the corresponding parameters for Gaussian, tand vine copulas are given in Section A.3: • C : Gaussian copula with the estimated linear correlation Σ ; • C : Gaussian copula that matches the estimated Kendall’s tau with corresponding corre-lation matrix Σ ; • C : t-copula with parameters ν and P estimated by MLE; • C : t-copula with parameters ν and P with P matching Kendall’s tau and ν estimatedby MLE; • C : Gumbel copula estimated by MLE with parameter θ = 1 . ; • C : Gumbel copula estimated on the basis of a minimal Cramér-von Mises distance ac-cording to Hofert, Mächler & McNeil (2013) with parameter θ = 1 . ; • C : Regular vine copula estimated according to AIC; • C : Regular vine copula estimated according to BIC. As a benchmark, we compute the AV@R at level 0.95 when dependence is modeled by the singleGaussian copula C estimated from the entire data set. The unit for the reported AV@R valuesis always one million. The estimated AV@R at level 0.95 equals . in this case on the basisof samples. If we use copulas C , C , ... , C points estimates range from about 45.2 to 53.1with signiﬁcant sampling error. As in Section 4.1, we compare this benchmark to the result ofthe algorithm with DM copulas that was described in Section 3. The DM approach provides amore sophisticated analysis of the worst case.With an equal initial weight of 1/8, a constant sample size N t = 10 and step size a = 0 . , theinitial AV@R at level 0.95 corresponds to . . Stopping SA according to the our stopping23ule at t ∗ = 10 , we obtain an estimated AV@R at level . of . with an empirical standarddeviation of . computed from the last ten iterations. The corresponding ¯ γ ⊤ equals (cid:18) . . . . . . . . . . (cid:19) with increments (¯ γ − ¯ γ ) ⊤ of an order of 1/500 or less.Setting K ∗ = 3 and switching to SAA, our algorithm selects the copulas C (Gumbel copulawith θ = 1 . ), C (Gumbel copula with θ = 1 . ), and C (Regular vine copulaaccording to BIC, Table 4) on the basis of the estimate ¯ γ . Thus, SAA needs to be applied toa three-dimensional grid on γ i + γ i + γ i = 1 , i = 1 , , γ ij ≥ , i = 1 , , j = 5 , , . With a sample size of sample size and . grid size, SAA selects the Gumbel copula C for D and the Gumbel copula C for D with a worst-case AV@R at level of . . For thedistortion D , the copula C leads to almost the same result, i.e., for D the sensitivity of theAV@R with respect to C and C is almost zero.In summary, when computing AV@R at level 0.95, DM methods provide an excellent methodfor identifying the relevant low-dimensional dependence structures, when many data are availableas illustrated in Section 4.1. In the current example on cyber risk, data are scarce and tail copulasare chosen ad hoc. In this case, our algorithm easily identiﬁed the worst-case dependence andreduces the dimensionality at the same time. If only few data are available in the tail, the choiceof tail copulas is restricted by only few constraints and the sensitivities of the AV@R within thisclass are more signiﬁcant. In all cases, the worst-case AV@R on the basis of the DM copulaprovides a substantially better understanding of downside risk than single copulas ﬁtted to thewhole data. Uncertainty requires suitable techniques for risk assessment. In this paper, we combined stochas-tic approximation and stochastic average approximation to develop an eﬃcient algorithm tocompute the worst case average value at risk in the face of tail uncertainty. Dependence wasmodelled by the distorted mix method that ﬂexibly assigns diﬀerent copulas to diﬀerent regionsof multivariate distributions. The method is computationally eﬃcient and allows at the sametime to identify copulas in a lower-dimensional mixture space that capture the worst case withhigh precision. We illustrated the application of our approach in the context of ﬁnancial marketsand cyber risk. Distorted mix copulas can ﬂexibly adjust the dependence structure in diﬀerentregions of a multivariate distribution. Our research indicated that they provide a powerful andﬂexible tool for capturing dependence in both the central area and tails of distributions.

References

Acciaio, B. & G. Svindland (2013), ‘Are law-invariant risk functions concave on distributions?’,

Dependence Modeling , 54–64.Arbenz, P., P. Embrechts & G. Puccetti (2011), ‘The AEP algorithm for the fast computationof the distribution of the sum of dependent random variables’, Bernoulli (2), 562–591.Arbenz, P., P. Embrechts & G. Puccetti (2012), ‘The GAEP algorithm for the fast computation ofthe distribution of a function of dependent random variables’, Stochastics (5-6), 569–597.24ardou, O.A., N. Frikha & G. Pagès (2009), ‘Computing VaR and CVaR using stochastic ap-proximation and adaptive unconstrained importance sampling’, Monte Carlo Methods andApplications (3), 173–210.Bardou, O.A., N. Frikha & G. Pagès (2016), ‘CVaR hedging using quantization-based stochasticapproximation algorithm’, Mathematical Finance (1), 184–229.Bartl, D., S. Drapeau & L. Tangpi (2020), ‘Computational aspects of robust optimized certaintyequivalents and option pricing’, Mathematical Finance (1), 287–309.Bellini, F. & V. Bignozzi (2015), ‘On elicitable risk measures’, Quantitative Finance (5), 725–733.Ben-Tal, A. & M. Teboulle (2007), ‘An old-new concept of convex risk measures: The optimizedcertainty equivalent’, Mathematical Finance (3), 449–476.Bernard, C. & S. Vanduﬀel (2015), ‘A new approach to assessing model risk in high dimensions’, Journal of Banking & Finance , 166–178.Bernard, C., X. Jiang & R. Wang (2014), ‘Risk aggregation with dependence uncertainty’, In-surance: Mathematics and Economics , 93–108.Bhatnagar, S., H.L. Prasad & L.A. Prashanth (2013), Stochastic Recursive Algorithms for Opti-mization: Simultaneous Perturbation Methods , Springer.Blanchet, J. & K. Murthy (2019), ‘Quantifying distributional model risk via optimal transport’,

Mathematics of Operations Research (2), 565–600.Breuer, T. & I. Csiszár (2016), ‘Measuring distribution model risk’, Mathematical Finance (2), 395–411.Broadie, M., D. Cicek & A. Zeevi (2011), ‘General bounds and ﬁnite-time improvement for theKiefer-Wolfowitz stochastic approximation algorithm’, Operations Research (5), 1211–1224.Condat, L. (2016), ‘Fast projection onto the simplex and the l -ball’, Mathematical Programming (1-2), 575–585.Cont, R., R. Deguest & G. Scandolo (2010), ‘Robustness and sensitivity analysis of risk mea-surement procedures’,

Quantitative Finance (6), 593–606.Dißmann, J., E.C. Brechmann, C. Czado & D. Kurowicka (2013), ‘Selecting and estimatingregular vine copulae and application to ﬁnancial returns’, Computational Statistics & DataAnalysis , 52 – 69.Dunkel, J. & S. Weber (2007), Eﬃcient Monte Carlo methods for convex risk measures in portfoliocredit risk models, WSC ’07, IEEE Press, pp. 958â–966.Dunkel, J. & S. Weber (2010), ‘Stochastic root ﬁnding and eﬃcient estimation of convex riskmeasures’, Operation Research (5), 1505–1521.Eling, M. & K. Jung (2018), ‘Copula approaches for modeling cross-sectional dependence of databreach losses’, Insurance: Mathematics and Economics , 167–180.Embrechts, P., G. Puccetti & L. Rüschendorf (2013), ‘Model uncertainty and VaR aggregation’, Journal of Banking & Finance (8), 2750–2764.25mbrechts, P., H. Liu & R. Wang (2018), ‘Quantile-based risk sharing’, Operations Research (4), 936–949.Erhardt, V. & C. Czado (2012), ‘Modeling dependent yearly claim totals including zero claimsin private health insurance’, Scandinavian Actuarial Journal (2), 106–129.Fu, M.C. (2006), Chapter 19 gradient estimation, in S. G.Henderson & B. L.Nelson, eds, ‘Sim-ulation’, Vol. 13 of

Handbooks in Operations Research and Management Science , Elsevier,pp. 575 – 616.Föllmer, H. & A. Schied (2004),

Stochastic Finance: An Introduction in Discrete Time , Walterde Gruyter, Berlin.Ghosh, S. & H. Lam (2019), ‘Robust analysis in stochastic simulation: Computation and per-formance guarantees’,

Operations Research (1), 232–249.Glasserman, P. & X. Xu (2014), ‘Robust risk measurement and model risk’, Quantitative Finance (1), 29–58.Hamm, A., T. Knispel & S. Weber (2020), ‘Optimal risk sharing in insurance networks’, EuropeanActuarial Journal (1), 203–234.Hofert, M., M. Mächler & A.J. McNeil (2013), ‘Archimedean copulas in high dimensions: Esti-mators and numerical challenges motivated by ﬁnancial applications’, Journal de la SociétéFrançaise de Statistique (1), 25–63.Hu, Z. & L.J. Hong (2013), ‘Kullback-Leibler divergence constrained distributionally robustoptimization’,

Available at Optimization Online .Huang, P., D. Subramanian & J. Xu (2010), ‘An importance sampling method for portfolioCVaR estimation with Gaussian copula models’,

Proceedings of the 2010 Winter SimulationConference (WSC) pp. 2790–2800.Kakouris, I. & B. Rustem (2014), ‘Robust portfolio optimization with copulas’,

European Journalof Operational Research (1), 28–37.Kiefer, J. & J. Wolfowitz (1952), ‘Stochastic estimation of the maximum of a regression function’,

The Annals of Mathematical Statistics (3), 462–466.Kim, S., R. Pasupathy & S.G. Henderson (2015), A guide to sample average approximation, in F.M.C., ed., ‘Handbook of Simulation Optimization. International Series in OperationsResearch & Management Science’, Vol. 216, Springer, New York.Krätschmer, V., A. Schied & H Zähle (2014), ‘Comparative and qualitative robustness for law-invariant risk measures’,

Finance and Stochastics , 271–295.Kushner, H.J. & G.G. Yin (2003), Stochastic approximation algorithms and applications ,Springer-Verlag.Li, L., K.C. Yuen & J. Yang (2014), ‘Distorted mix method for constructing copulas with taildependence’,

Insurance: Mathematics and Economics , 77–89.Li, L., R. Wang H. Shao & J. Yang (2018), ‘Worst-case range value-at-risk with partial informa-tion’, SIAM Journal on Financial Mathematics (1), 190–218.McNeil, A.J., R. Frey & P. Embrechts (2015), Quantitative Risk Management: Concepts, Tech-niques and Tools , Princeton University Press, Princeton, NJ, USA.26eng, F.W., J. Sun & M. Goh (2010), ‘Stochastic optimization problems with CVaR risk measureand their sample average approximation’,

Journal of Optimization Theory and Applications (2), 399–418.Pﬂug, G.C. (2000), Some remarks on the value-at-risk and the conditional value-at-risk, in S.Uryasev, ed., ‘Probabilistic Constrained Optimization. Nonconvex Optimization and ItsApplications’, Vol. 49, Springer, Boston, MA, pp. 272–281.Puccetti, G., L. Rüschendorf, D. Small & S. Vanduﬀel (2017), ‘Reduction of value-at-risk boundsvia independence and variance information’,

Scandinavian Actuarial Journal (3), 245–266.Rockafellar, R.T. & S. Uryasev (2000), ‘Optimization of conditional value-at-risk’,

Journal ofRisk (3), 21–41.Rockafellar, R.T. & S. Uryasev (2002), ‘Conditional value-at-risk for general loss distributions’, Journal of Banking & Finance , 1443–1471.Rüschendorf, L. (2017), Risk bounds and partial dependence information, in D. Ferger,W. González Manteiga, T. Schmidt, JL. Wang, ed., ‘From Statistics to Mathematical Fi-nance’, Springer, Cham, pp. 345–366.Shapiro, A. (2003), ‘Monte Carlo sampling methods’,

Handbooks in Operations Research andManagement Science , 353–425.Sun, H., H. Xu & Y Wang (2014), ‘Asymptotic analysis of sample average approximation forstochastic optimization problems with joint chance constraints via conditional value atrisk and diﬀerence of convex functions’, Journal of Optimization Theory and Applications , 257–284.Tamar, A., Y. Glassner & S. Mannor (2015), ‘Optimizing the CVaR via sampling’,

Proceedingsof the 29th AAAI Conference on Artiﬁcial Intelligence pp. 2993–2999.Weber, S. (2006), ‘Distribution-invariant risk measures, information, and dynamic consistency’,

Mathematical Finance (2), 419–441.Weber, S. (2018), ‘Solvency II, or how to sweep the downside risk under the carpet’, Insurance:Mathematics and Economics , 191 – 200.Zhu, S. & M. Fukushima (2009), ‘Worst-case conditional value-at-risk with application to robustportfolio management’, Operations Research (5), 1155–1168.27 Appendix

A.1 Proof of Lemma 1

Deﬁne ζ ( u, ¯ γ ) = u + α − p E [Ψ − u ] + + P mi =1 P Kj =1 α i γ ij − p E [Ψ ij − u ] + , where Ψ and Ψ ij are randomvariables having the distribution G and G ij in (7), respectively. The ﬁniteness of the function ζ is guaranteed by the existence of the AV@R, or equivalently by E | Ψ | < ∞ and E | Ψ ij | < ∞ for each i and j . Moreover, a convex function ζ ( · , ¯ γ ) has ﬁnite right and left derivatives for any ¯ γ . Observe that ζ ( u ′ , ¯ γ ) − ζ ( u, ¯ γ ) u ′ − u = 1 + α − p E [Ψ − u ′ ] + − E [Ψ − u ] + u ′ − u + m X i =1 K X j =1 α i γ ij − p E [Ψ ij − u ′ ] + − E [Ψ ij − u ] + u ′ − u . When u ′ > u , E [Ψ − u ′ ] + − E [Ψ − u ] + u ′ − u =  − if Ψ ≥ u ′ if Ψ ≤ u E h − Ψ + uu ′ − u i ∈ ( − , if u < Ψ < u ′ . Then there exist ρ ( u, u ′ ) ∈ [0 , for which E [Ψ − u ′ ] + − E [Ψ − u ] + u ′ − u = − (1 − P (Ψ ≤ u ′ )) − ρ ( u, u ′ )( P (Ψ ≤ u ′ ) − P (Ψ ≤ u )) . By letting u ′ ↓ u , we have P (Ψ ≤ u ′ ) converges to P (Ψ ≤ u ) which makes lim u ′ ↓ u E [Ψ − u ′ ] + − E [Ψ − u ] + u ′ − u = P (Ψ ≤ u ) − . Similarly, we can compute lim u ′ ↓ u ζ ( u ′ , ¯ γ ) − ζ ( u, ¯ γ ) u ′ − u = 1 + α − p ( P (Ψ ≤ u ) −

1) + m X i =1 K X j =1 α i γ ij − p ( P (Ψ ij ≤ u ) − − − p + α − p P (Ψ ≤ u ) + m X i =1 K X j =1 α i γ ij − p P (Ψ ij ≤ u ) which is ∂ + ζ∂u ( u, ¯ γ ) . Analogously, we can compute ∂ − ζ∂u ( u, ¯ γ ) . The remaining results are nowstraightforward. A.2 Data in Section 4.1

Dependence in the central part

Dependence in the central part is modeled as the Gaussian copula whose correlation matrixconsists of the estimated linear correlations. The estimated correlation matrix based on the 92%data is

Σ =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  .28 ependence in the upper tail part We consider K = 16 candidate copulas in the tail parts. The calibration results are summarizedin the following. • C : Gaussian copula matching the estimated linear correlation in the upper (4%-8%) data Σ =  . . − . − . − . − . . . − . − . − . − . . . − . − . − . − . − . − . − . . − . − . − . − . − . . − . − . − . − . − . − . − . . − . − . − . − . − . .  . • C : Gaussian copula matching the estimated Kendall’s tau in the upper (4%-8%) data Σ =  . . − . − . − . − . . . − . − . − . − . . . − . − . − . − . − . − . − . . − . − . − . − . − . . − . − . − . − . − . − . − . . − . − . − . − . − . .  . • C : t-copula estimated from the upper (4%-8%) data using ML ν = 10 . , P =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : t-copula estimated from the upper (4%-8%) data using approximate ML ν = 4 . , P =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : t-copula estimated from the upper (4%-8%) data using Kendall’s tau and ML ν = 1 . , P =  . . − . − . − . − . . . − . − . − . − . . . − . − . − . − . − . − . − . . − . − . − . − . − . . − . − . − . − . − . − . − . . − . − . − . . − . .  . C : Grouped t-copula estimated from the upper (4%-8%) data using ML ν = (4 . , . , . , P =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . • C : Grouped t-copula estimated from the upper (4%-8%) data using approximate ML ν = (1 . , . , . , P =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . • C : Grouped t-copula estimated from the upper (4%-8%) data using Kendall’s tau andML ν = (0 . , . , . , P =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . • C : Gaussian copula matching the estimated linear correlation in the extreme upper (-4%)data Σ =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : Gaussian copula matching the estimated the Kendall’s tau in the extreme upper (-4%)data Σ =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : t-copula estimated from the extreme upper (-4%) data using ML ν e = 39 . , P e =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . C : t-copula estimated from the extreme upper (-4%) data using approximate ML ν e = 2 . , P e =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : t-copula estimated from the extreme upper (-4%) data using Kendall’s tau and ML ν e = 2 . , P e =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . • C : Grouped t-copula estimated from the extreme upper (-4%) data using ML ν e = (13 . , . , . , P e =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . • C : : Grouped t-copula estimated from the extreme upper (-4%) data using approximateML ν e = (1 . , . , . , P e =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . • C : : Grouped t-copula estimated from the extreme upper (-4%) data using Kendall’s tauand ML ν e = (1 . , . , . , P e =  . . . . . .  , (cid:20) . . (cid:21) , (cid:20) . . (cid:21) . A.3 Data in Section 4.2

Dependence in the central part

Dependence in the central part is modeled as the Gaussian copula whose correlation matrixconsists of the estimated linear correlations as Σ =  − . − . . − . − . . − . − . − . . − . . . − . − . − . − . − . . − .  . ependence in the tail parts • Σ =  . . . − . . − . . − . . − . − . . . . − . − . − . − . . − .  • ν = 27 . , P =  . . . . . . . . . . − . − . . . − . . . . − . .  • ν = 3 . , P = Σ • C : The regular vine copula is estimated according to AIC. For more information, werefer to Dißmann et al. (2013). The estimation was conducted by the vine copula packagein R. https://cran.r-project.org/web/packages/VineCopula/VineCopula.pdf . Theselected trees, pair copulas and the estimated parameters are given in Table 3. • C : The regular vine copula is estimated according to BIC. For more information, werefer to Dißmann et al. (2013). The estimation was conducted by the vine copula packagein R. https://cran.r-project.org/web/packages/VineCopula/VineCopula.pdf . Theselected trees, pair copulas and the estimated parameters are provided in Table 4.32ree pair copula parameters1 3,4 Frank 3.635,2 Frank 6.12 25,1 Frank 4.39 0.065,3 Tawn type 2 180 degrees 3.57 0.462 5,4 ; 3 Tawn type 2 4.12 0.371,2 ; 5 Tawn type 2 180 degrees 1.78 0.413,1 ; 5 Survival BB8 6 0.173 1,4 ; 5,3 Tawn type 1 3.61 0.393,2 ; 1,5 Joe 1.114 2,4 ; 1,5,3 Tawn type 2 180 degrees 1.6 0.31Table 3: The structure, pair copulas, and parameters of the regular vine copula C estimatedaccording to AIC .Tree pair copula parameters1 3,4 Frank 3.635,2 Frank 6.12 25,1 Frank 4.39 0.065,3 Tawn type 2 180 degrees 3.57 0.462 5,4 ; 3 Tawn type 2 4.12 0.371,2 ; 5 Survival Joe 1.553,1 ; 5 Independence3 1,4 ; 5,3 Tawn type 1 3.49 0.393,2 ; 1,5 Independence4 2,4 ; 1,5,3 Clayton 0.52Table 4: The structure, pair copulas, and parameters for the regular vine copula C8