[PDF] Ordering and Inequalities for Mixtures on Risk Aggregation

Abstract

Aggregation sets, which represent model uncertainty due to unknown dependence, are an important object in the study of robust risk aggregation. In this paper, we investigate ordering relations between two aggregation sets for which the sets of marginals are related by two simple operations: distribution mixtures and quantile mixtures. Intuitively, these operations ``homogenize" marginal distributions by making them similar. As a general conclusion from our results, more ``homogeneous" marginals lead to a larger aggregation set, and thus more severe model uncertainty, although the situation for quantile mixtures is much more complicated than that for distribution mixtures. We proceed to study inequalities on the worst-case values of risk measures in risk aggregation, which represent conservative calculation of regulatory capital. Among other results, we obtain an order relation on VaR under quantile mixture for marginal distributions with monotone densities. Numerical results are presented to visualize the theoretical results and further inspire some conjectures. Finally, we provide applications on portfolio diversification under dependence uncertainty and merging p-values in multiple hypothesis testing, and discuss the connection of our results to joint mixability.

Full PDF

aa r X i v : . [ q -f i n . R M ] J u l Ordering and Inequalities for Mixtures on Risk Aggregation

Yuyu Chen ∗ Peng Liu † Yang Liu ‡ Ruodu Wang § July 27, 2020

Abstract

Aggregation sets, which represent model uncertainty due to unknown dependence, are an importantobject in the study of robust risk aggregation. In this paper, we investigate ordering relations betweentwo aggregation sets for which the sets of marginals are related by two simple operations: distributionmixtures and quantile mixtures. Intuitively, these operations “homogenize” marginal distributions bymaking them similar. As a general conclusion from our results, more “homogeneous” marginals lead toa larger aggregation set, and thus more severe model uncertainty, although the situation for quantilemixtures is much more complicated than that for distribution mixtures. We proceed to study inequalitieson the worst-case values of risk measures in risk aggregation, which represent conservative calculationof regulatory capital. Among other results, we obtain an order relation on VaR under quantile mixturefor marginal distributions with monotone densities. Numerical results are presented to visualize thetheoretical results and further inspire some conjectures. Finally, we discuss the connection of our resultsto joint mixability and to merging p-values in multiple hypothesis testing.

Keywords: aggregation set; distribution mixture; quantile mixture; risk measure; joint mixability ∗ Department of Statistics and Actuarial Science, University of Waterloo, Canada. Email: [email protected] † Department of Mathematical Sciences, University of Essex, UK. Email: [email protected] ‡ Department of Mathematical Sciences, Tsinghua University, China. Email: [email protected] § Department of Statistics and Actuarial Science, University of Waterloo, Canada. Email: [email protected] Introduction

Robust risk aggregation has been studied extensively with applications in banking and insurance. Atypical problem in this area is to compute the worst-case values of some risk measures for an aggregate losswith unknown dependence structure. Two popular regulatory risk measures used in industry are Value-at-Risk (VaR) and the Expected Shortfall (ES); see McNeil et al. (2015) and the references therein. The worst-case value of ES in risk aggregation is explicit since ES is a coherent risk measure (Artzner et al. (1999)),whereas the worst-case value of VaR in risk aggregation generally does not admit analytical formulas, whichis a known challenging problem (see e.g., Embrechts et al. (2013, 2015)). See Cai et al. (2018) on robust riskaggregation for general risk measures, and Eckstein et al. (2020) on computation of robust risk aggregationusing neural networks.The above robust risk aggregation problem takes a supremum over an aggregation set , the most impor-tant object of this paper. Fix an atomless probability space (Ω , F , P ) and let M be the set of cdfs on R . For F ∈ M , X ∼ F means that the cdf of a random variable X is F . Moreover, let M denote the set of cdfson R with ﬁnite mean. For F = ( F , . . . , F n ) ∈ M n , the aggregation set (Bernard et al. (2014)) is deﬁned as D n ( F ) = { cdf of X + · · · + X n : X i ∼ F i , i = 1 , . . . , n } . The obvious interpretation is that D n ( F ) fully describes model uncertainty associated with known marginaldistributions F , . . . , F n but unknown dependence structure. The separate modeling of marginals and de-pendence is a standard practice in quantitative risk modeling, often involving copula techniques; see e.g.,McNeil et al. (2015). An analytical characterization of D n ( F ) for a given F is very diﬃcult and challenging.The only available analytical results are in Mao et al. (2019) for standard uniform marginals.The main objective of this paper is to compare model uncertainty of risk aggregation for F , G ∈ M n which represent two possible models of marginals. The strongest form of comparison is set inclusion betweentwo aggregation sets D n ( F ) and D n ( G ). It turns out that such a strong relation may be achievable if F , G ∈ M n are related by the simple operations of distribution mixtures and quantile mixtures. Both typesof operations are common in statistics and risk management, as they correspond to simple operations onthe parameters in statistical models or on portfolio construction. Moreover, if G is obtained from F via adistribution or quantile mixture, then the mean (assumed to be ﬁnite) of any element of D n ( G ) is the sameas that of any element of D n ( F ), making the comparison fair. To the best of our knowledge, this paperis the ﬁrst systematic study on the order relation between D n ( F ) and D n ( G ) for diﬀerent F and G , thuscomparing model uncertainty at the level of all possible distributions.In some cases, a strong comparison via set inclusion is not possible, but we can compare values of achosen risk measure. For a law-invariant risk measure ρ : M → R , we denote by ρ ( F ) the worst-case value In this paper, we treat probability measures on B ( R ) and cdfs on R as equivalent objects. We conveniently treat law-invariant risk measures as mappings on M , although it is conventional to treat them as mappingson a space of random variables. The two settings are equivalent for law-invariant risk measures. ρ in risk aggregation for F ∈ M n , that is, ρ ( F ) = sup { ρ ( F ) : F ∈ D n ( F ) } . We shall compare ρ ( F ) with ρ ( G ), thus the worst-case values of a risk measure under model uncertainty,which usually represent conservative calculation of regulatory risk capital (e.g., Embrechts et al. (2013)).Certainly, D n ( F ) ⊂ D n ( G ) implies ρ ( F ) ρ ( G ) for all risk measures ρ , implying that the ﬁrst comparisonis stronger than the second one. Our study brings insights to two relevant problems in risk management. First, suppose that F and G aretwo possible statistical models for the marginal distributions in a risk aggregation setting. Our results allowfor a comparison of model uncertainty associated with the two models, regardless of the choice of risk mea-sures. Although a completely unknown dependence structure is sometimes unrealistic, it is commonly agreedthat the dependence structure in a risk model is diﬃcult to accurately specify (e.g., Embrechts et al. (2013)and Bernard et al. (2017)). Hence, a comparison of the magnitude of model uncertainty is an importantpractical issue. On the other hand, the general conclusions remain valid even if the marginal distributionsare not completely speciﬁc (see the discussion in Section 9 on the presence of marginal uncertainty), andthus the assumption of known marginal distributions in our study is not harmful.Second, our results provide an analytical way to establish inequalities on the worst-case risk measuresin the form ρ ( F ) ρ ( G ). Sometimes the worst-case risk measure is diﬃcult to calculate for F , but it maybe easier to calculate for G . For instance, formulas on worst-case VaR are available for some homogenousmarginal distributions in Wang et al. (2013) and Puccetti and R¨uschendorf (2013), but explicit results onheterogeneous marginal distributions are limited (see Blanchet et al. (2020) for a recent treatment). There-fore, we can use the analytical formula ρ ( G ), if available, as an upper bound on ρ ( F ), and this leads tointeresting applications in other ﬁelds; see Section 7 for an application in multiple hypothesis testing andSection 8 for a connection to joint mixability.Our theoretical contributions are brieﬂy summarized below. In Sections 2 and 3, we analyze generalrelations on distribution and quantile mixtures. The general message of our results is that the more “ho-mogeneous” the distribution tuple is, the larger its corresponding aggregation set D n is. In particular,the set inclusion is established for any tuples connected by distribution mixtures in Theorem 1; that is, D n ( F ) ⊂ D n ( G ) if G is a distribution mixture of F . The problem for quantile mixtures is much morechallenging. The set inclusion is established for uniform marginals in Proposition 2. For other families ofdistributions, such a general relationship does not hold, as discussed with some examples.In Section 4, we obtain inequalities between the worst-case values of some risk measure ρ in risk ag-gregation with marginals related by distribution or quantile mixtures. Although quantile mixtures do notsatisfy the relationship D n ( F ) ⊂ D n ( G ) in general, we can prove an order property between ρ ( G ) and ρ ( F )for commonly used risk measures. Most remarkably, in Theorem 3, we show that under a monotone density In this paper, the set inclusion “ ⊂ ” is non-strict; the strict set inclusion is “ ( ”. Similarly, the terms “increasing” and“decreasing” are in the non-strict sense. We ﬁrst summarize some simple properties of D n that will be useful. For these properties, see Theorem2.1 and Remark 2.2 of Bernard et al. (2014). Lemma 1.

For F , G ∈ M n , λ ∈ [0 , and an n -permutation σ , the following hold.(i) D n ( F ) = D n ( σ ( F )) .(ii) λ D n ( F ) + (1 − λ ) D n ( G ) ⊂ D n ( λ F + (1 − λ ) G ) . In particular,(a) λ D n ( F ) + (1 − λ ) D n ( F ) = D n ( F ) .(b) D n ( F ) ∩ D n ( G ) ⊂ D n ( λ F + (1 − λ ) G ) . We brieﬂy ﬁx some notation and convention. Let ∆ n be the standard simplex given by ∆ n = { ( λ , . . . , λ n ) ∈ [0 , n : P ni =1 λ i = 1 } . Recall that a doubly stochastic matrix is a square matrix of nonnegative real numbers,each of whose rows and columns sums to 1 (i.e. each row or column is in ∆ n ). Denote by Q n the set of n × n doubly stochastic matrices. All vectors should be treated as column vectors. For λ = ( λ , . . . , λ n ) ∈ ∆ n and F = ( F , . . . , F n ) ∈ M n , their dot product is λ · F = P ni =1 λ i F i ∈ M . For a matrix Λ = ( λ , . . . , λ n ) ⊤ ∈ Q n and F ∈ M n , their product is Λ F = ( λ · F , . . . , λ n · F ) ∈ M n .The vector Λ F is a distribution mixture of F , and we will call it the Λ -mixture of F to emphasize thereliance on Λ. Indeed, Λ F can be seen as a vector of weighted averages of F . In particular, by choosingΛ = ( n ) n × n , we get the vector ( F, . . . , F ) where F is the average of components of F . Note that if F ∈ M n ,then the mean of any element of D n ( F ) is the same as that of D n (Λ F ).The ﬁrst result below suggests that the set of aggregation for a tuple of distributions is smaller thanthat for the weighted averages. The proof is elementary, but the result allows us to observe the importantphenomenon that more homogeneous marginals lead to a larger aggregation set . Theorem 1.

For F ∈ M n and Λ ∈ Q n , D n ( F ) ⊂ D n (Λ F ) .Proof. Let Π , . . . , Π n ! be all diﬀerent n -permutation matrices, i.e. Π k F is a permutation of F . By Birkhoﬀ’sTheorem (Theorem 2.A.2 of Marshall et al. (2011)), the set Q n of doubly stochastic matrices is the convex4ull of permutation matrices, that is, for any Λ ∈ Q n , there exists ( λ , . . . , λ n ! ) ∈ ∆ n ! , such thatΛ = n ! X k =1 λ k Π k . Note that D n ( F ) = D n (Π k F ) for k = 1 , . . . , n ! by Lemma 1 (i). Further, by Lemma 1 (ii-b), we have, D n ( F ) = n ! \ k =1 D n (Π k F ) ⊂ D n n ! X k =1 λ k Π k ( F ) ! = D n (Λ F ) . This completes the theorem.

Corollary 1.

For F = ( F , . . . , F n ) ∈ M n and Λ ∈ Q n , D n (Λ F ) ⊂ D n ( F, . . . , F ) where F = n P ni =1 F i . By taking Λ as the identity in Corollary 1, we obtain the set inclusion D n ( F ) ⊂ D n ( F, . . . , F ), whichwas given in Theorem 3.5 of Bernard et al. (2014) to ﬁnd the bounds on VaR for heterogeneous marginaldistributions.The doubly stochastic matrices are closely related to majorization order. For λ , γ ∈ R n , we say that λ dominates γ in majorization order , denoted by γ ≺ λ , if P ni =1 φ ( γ i ) P ni =1 φ ( λ i ) for all continuous convexfunctions φ . There are several equivalent conditions for this order; see Section 1.A.3 of Marshall et al. (2011).One equivalent condition that is relevant to Theorem 1 is that γ ≺ λ if and only if there exists Λ ∈ Q n such that γ = Λ λ . We can similarly deﬁne majorization order between F , G ∈ M n , denoted by G ≺ F , if G = Λ F for some Λ ∈ Q n . Then, we have the following corollary. Corollary 2.

For F , G ∈ M n , if G ≺ F , then D n ( F ) ⊂ D n ( G ) . Example 1 (Bernoulli distributions) . We apply Theorem 1 to Bernoulli distributions. Let B p be theBernoulli distribution with parameter p ∈ [0 , B p , . . . , B p n ) = ( B q , . . . , B q n ) where q = Λ p . Therefore, for any p , q ∈ [0 , n with q ≺ p ,we have D n ( B p , . . . , B p n ) ⊂ D n ( B q , . . . , B q n ). This result will be used later to discuss joint mixability (seeSection 8) of Bernoulli distributions.Next, we discuss how Λ-mixtures aﬀect the lower sets with respect to convex order. A distribution F ∈ M is called smaller than a distribution G ∈ M in convex order , denoted by F ≺ cx G , if Z φ d F Z φ d G for all convex φ : R → R , (1)provided that both integrals exist (ﬁnite or inﬁnite); see M¨uller and Stoyan (2002) and Shaked and Shanthikumar(2007) for an overview on convex order and the related notion of second-order stochastic dominance. For agiven distribution F ∈ M , denote by C ( F ) the set of all distributions in M dominated by F in convexorder, that is, C ( F ) = { G ∈ M : G ≺ cx F } . F and G , we denote by F ⊕ G the distribution with quantile function F − + G − . Moreover, deﬁne C ( F , . . . , F n ) = C ( F ⊕ · · · ⊕ F n ) . The following lemmas give a simple link between the sets D n and C ; see e.g., Lemma 1 of Mao et al. (2019). Lemma 2.

For F ∈ M n , D n ( F ) ⊂ C ( F ) . Similarly to the set D n ( F ) in Theorem 1, C ( F ) also satisﬁes an order with respect to Λ-mixture. Theorem 2.

For F ∈ M n and Λ ∈ Q n , we have C ( F ) ⊂ C (Λ F ) .Proof. Note that F ⊕ · · · ⊕ F n ∈ D n ( F ) since F ⊕ · · · ⊕ F n corresponds to the sum of comonotonic ran-dom variables with respective distributions F , . . . , F n . Using Theorem 1 and Lemma 2, we have D n ( F ) ⊂D n (Λ F ) ⊂ C (Λ F ). Hence, C ( F ) ⊂ C (Λ F ). In Section 2, we have seen a set inclusion between D n ( F ) and D n ( G ) where G is a distribution mixtureof F . The general message from Theorem 1 is that distribution mixtures enlarge the aggregation sets.As distribution mixture corresponds to the arithmetic average of distribution functions, it would then beof interest to see whether a “harmonic average” of F , . . . , F n would give similar properties. By saying“harmonic average” of F , . . . , F n , we mean the distribution F with F − = n P ni =1 F − i , i.e., the average ofquantiles. We shall call this type of average as quantile mixture .In many statistical applications, marginal distributions of a multi-dimensional object are modelled inthe same location-scale family (such as Gaussian, elliptical, or uniform family). The quantile mixture of suchdistributions is still in the same family, whereas the distribution mixture is typically no longer in the family.Moreover, a quantile mixture also corresponds to the combination of comonotonic random variables (such ascombining an asset price with a call option on it), and hence ﬁnds its natural position in ﬁnance. As such,it is rather important and practical to consider quantile mixtures. Remark . The two types of mixtures are both basic operations on distributions and often lead to qualitativelyvery diﬀerent mathematical results. As a famous example in decision theory, the axiom of linearity ondistribution mixtures leads to the classic von Neumann-Morgenstern expected utility theory, whereas theaxiom of linearity on quantile mixtures leads to the dual utility theory of Yaari (1987).For a matrix Λ of non-negative elements (not necessarily in Q n ) and F ∈ M n , let Λ ⊗ F be a vectorof distributions G such that componentwise, G − is equal to Λ F − . If Λ ∈ Q n , we call G = Λ ⊗ F theΛ -quantile mixture of F . If F ∈ M n , then the mean of any element of D n ( F ) is the same as that of D n (Λ ⊗ F ), In other words, F ⊕ G is the distribution of the sum of two comonotonic random variables with respective distributions F and G . Two random variables X and Y are said to be comonotonic , if there exists a random variable U and two increasingfunctions f, g such that X = f ( U ) and Y = g ( U ) almost surely. Such U can be chosen as U[0 ,

1] distributed, and f and g canbe chosen as the inverse distribution functions of X and Y , respectively. D n ( F ) with D n (Λ ⊗ F ),just like what we did in Section 2 for distribution mixture.The ﬁrst natural candidates for us to look at are D n ( F , . . . , F n ) and D n ( F, . . . , F ) where F − = n P ni =1 F − i , thus the quantile version of Corollary 1. Unfortunately, the sets D n ( F , . . . , F n ) and D n ( F, . . . , F )are not necessarily comparable, as seen from the following example.

Example 2.

Take F as a binary uniform distribution (with probability 1 / { , } and F as a binary uniform distribution on { , } . Clearly, F is a binary uniform distribution on { , } . D ( F , F )contains distributions supported on { , , , } and D ( F, F ) contains distributions supported on { , , } .Therefore, these two sets do not have a relation of set inclusion.On the other hand, as a trivial example, if F , . . . , F n are point masses (without loss of generality, we as-sume that they are point masses at 0), then F satisﬁes F − = F − /n . In this case, D n ( F , . . . , F n ) = { F } ⊂D n ( F, . . . , F ) holds trivially. Therefore, we can expect that the inclusion D n ( F , . . . , F n ) ⊂ D n ( F, . . . , F )may hold under some special settings.Below, we note that both D n ( F ) and D n (Λ ⊗ F ) have the same convex-order maximal element. This isin sharp contrast to the case of mixtures in Theorem 2. Proposition 1 can be veriﬁed directly by deﬁnition. Proposition 1.

For F ∈ M n and Λ ∈ Q n , we have C ( F ) = C (Λ ⊗ F ) . As we see from Example 2, D n ( F ) and D n (Λ ⊗ F ) are not necessarily comparable. In Mao et al. (2019),a non-trivial result is established for the aggregation of standard uniform distributions, which leads to aninteresting observation along this direction. Proposition 2.

Suppose that F , . . . , F n are uniform distributions, n > , and Λ = ( n ) n × n . Then D n ( F ) ⊂D n (Λ ⊗ F ) .Proof. Note that the components of Λ ⊗ F are uniform distributions with equal length. By Theorem 5 ofMao et al. (2019), we have D n (Λ ⊗ F ) = C n (Λ ⊗ F ) . Using Proposition 1, we have C n ( F ) = C n (Λ ⊗ F ). Lemma2 further yields D n ( F ) ⊂ C n ( F ). Putting the above results together, we obtain D n ( F ) ⊂ D n (Λ ⊗ F ).It is unclear whether D n ( F ) ⊂ D n (Λ ⊗ F ) under some other conditions, similarly to Proposition 2. Notethat the set inclusion D n ( F ) ⊂ D n (Λ ⊗ F ) would help us to obtain semi-explicit formulas for bounds on riskmeasures (such as VaR), since by choosing Λ = ( n ) n × n , the marginal distributions of Λ ⊗ F are the same,and formulas for VaR bounds in e.g., Wang et al. (2013) and Bernard et al. (2014) are applicable; see Section4. There are several sharp contrasts regarding distribution and quantile mixtures. In addition to thecontrast on order relations that we see from Theorem 1 and Example 2, the two notions also treat locationshifts on the marginal distributions very diﬀerently. This point will be explained in Section 8.1.7 Bounds on the worst-case values of risk measures

This section is dedicated to exploring the inequalities between the worst-cases value of risk measures inrisk aggregation with diﬀerent marginal distribution tuples. Our main results in Sections 2 and 3 will helpto ﬁnd the inequalities in Proposition 5.

We pay a particular attention to the popular regulatory risk measure VaR, which is a quantile functional.For F ∈ M , for p ∈ (0 , p : M → R asVaR p ( F ) = F − ( p ) = inf { x ∈ R : F ( x ) > p } . Another popular regulatory risk measure is ES p : M → R for p ∈ (0 , p ( F ) = 11 − p Z p F − ( u )d u. Given marginals F , the worst-case value of VaR in risk aggregation with unknown dependence structure isthen deﬁned as VaR p ( F ) = sup { VaR p ( G ) : G ∈ D n ( F ) } . In other words, VaR p ( F ) is the largest value of VaR p of the aggregate risk X + · · · + X n over all possibledependence structures among X i ∼ F i , i = 1 , . . . , n . Similarly, the worst-case value of ES in risk aggregationis deﬁned as ES p ( F ) = sup { ES p ( G ) : G ∈ D n ( F ) } . The worst-case value of ES in risk aggregation is easy to calculate since ES is consistent with convexorder. On the other hand, worst-case value of VaR in risk aggregation generally does not admit any analyticalformula, which is a challenging problem; results under some speciﬁc cases are given in Wang et al. (2013),Puccetti and R¨uschendorf (2013) and Bernard et al. (2014). To obtain approximations for VaR p ( F ), onemay use the asymptotic equivalence between VaR and ES in Embrechts et al. (2015) and then directly applyES bounds, or use a numerical algorithm such as the rearrangement algorithm of Puccetti and R¨uschendorf(2012) and Embrechts et al. (2013).We will discuss a general relationship on risk measures for diﬀerent aggregation sets. A risk measure is a functional ρ : M ρ → R , where M ρ ⊂ M is the set of distributions of some ﬁnancial losses. Forinstance, if ρ is the mean, then M ρ is naturally chosen as the set of distributions with ﬁnite mean. Wedenote by ρ ( F ) the worst-case value of ρ in risk aggregation for F ∈ M n , that is, assuming D n ( F ) ⊂ M ρ , ρ ( F ) = sup { ρ ( G ) : G ∈ D n ( F ) } . .2 Inequalities implied by stochastic dominance Quite obviously, one can compare the worst-case values of some risk measures for two tuples of distri-butions satisfying some stochastic dominance, which we brieﬂy discuss here.A distribution F ∈ M is smaller than a distribution G in stochastic order (also ﬁrst-order stochasticdominance), denoted by F ≺ st G , if F > G . For F , G ∈ M n , we say that F is smaller than G in stochasticorder, denoted by F ≺ st G , if F i ≺ st G i , i = 1 , . . . , n . Analogously, for F , G ∈ M n , we say that F is smallerthan G in convex order, denoted by F ≺ cx G , if F i ≺ cx G i , i = 1 , . . . , n .We deﬁne two relevant common properties of risk measures. A risk measure ρ is monotone if ρ ( F ) ρ ( G )whenever F ≺ st G ; it is consistent with convex order if ρ ( F ) ρ ( G ) whenever F ≺ cx G . Almost all riskmeasures used in practice are monotone; ES is consistent with convex order whereas VaR is not. Monetaryrisk measures (see F¨ollmer and Schied (2016)) that are consistent with convex order are characterized byMao and Wang (2020) and they admit an ES-based representation. In particular, all lower semi-continuousconvex risk measures, including ES and expectiles (e.g., Ziegel (2016) and Delbaen et al. (2016)), are consis-tent with convex order; we refer to F¨ollmer and Schied (2016) for an overview on risk measures.Now we state in Proposition 3 that one can compare the worst-case values of some risk measures for F and G if F is smaller than G in stochastic order or convex order. Proposition 3.

Let ρ be a risk measure and F , G ∈ M n with D n ( F ) , D n ( G ) ⊂ M ρ .(i) If ρ is monotone and F ≺ st G , then ρ ( F ) ρ ( G ) . (ii) If ρ is consistent with convex order and F ≺ cx G with F , G ∈ M n , then ρ ( F ) ρ ( G ) . Proof. (i) is straightforward to verify. We next focus on (ii). Since F ⊕ · · · ⊕ F n is the largest distribution in D n ( F ) with respect to convex order and ρ is consistent with convex order, we have ρ ( F ) = ρ ( F ⊕ · · · ⊕ F n ).Similarly, ρ ( G ) = ρ ( G ⊕ · · · ⊕ G n ). Note that F ≺ cx G means F i ≺ cx G i , i = 1 , . . . , n . For all p ∈ (0 , p , we haveES p ( F ⊕ · · · ⊕ F n ) = n X i =1 ES p ( F i ) n X i =1 ES p ( G i ) = ES p ( G ⊕ · · · ⊕ G n ) , which gives F ⊕· · ·⊕ F n ≺ cx G ⊕· · ·⊕ G n (see e.g., Theorem 3.A.5 of Shaked and Shanthikumar (2007)).In the following result, we will show that the distribution tuples and their Λ-mixture or Λ-quantilemixture typically do not satisfy stochastic order or convex order, unless the mixture operation is essentiallyidentical (Λ F = F or Λ ⊗ F = F ). The proof of Proposition 4 is put in Appendix A.2. Proposition 4.

Suppose Λ ∈ Q n . The statements within each of (i)-(iv) are equivalent.(i) For F ∈ M n , (a) Λ F ≺ st F ; (b) F ≺ st Λ F ; (c) Λ F = F .(ii) For F ∈ M n , (a) Λ ⊗ F ≺ st F ; (b) F ≺ st Λ ⊗ F ; (c) Λ ⊗ F = F . iii) For F ∈ M n , (a) Λ ⊗ F ≺ cx F ; (b) F ≺ cx Λ ⊗ F ; (c) Λ ⊗ F = F .(iv) For F ∈ M n , (a) Λ F ≺ cx F ; (b) Λ F = F . An implication of Proposition 4 is that the result on stochastic order in Proposition 3 cannot be appliedto compare the worst-case values of risk measures for F and Λ F or F and Λ ⊗ F . Nevertheless, this comparisoncan be conducted by applying our ﬁndings in Sections 2 and 3 and some other techniques. This will be thetask in the next subsection. In the following, we will obtain inequalities bewteen the worst-case values of risk measures for F andΛ F or F and Λ ⊗ F . First, we apply Theorem 1 and Proposition 1 and immediately obtain the followingresult. Proposition 5.

Let ρ be a risk measure and Λ ∈ Q n .(i) For F ∈ M n with D n ( F ) ⊂ M ρ and D n (Λ F ) ⊂ M ρ , we have ρ ( F ) ρ (Λ F ) ;(ii) For F ∈ M n with D n ( F ) ⊂ M ρ and D n (Λ ⊗ F ) ⊂ M ρ , if ρ is consistent with convex order, then ρ ( F ) = ρ (Λ ⊗ F ) = ρ ( F ⊕ · · · ⊕ F n ) . Note that in Proposition 5, the inequality for distribution mixture is valid for all risk measures whereasthe equality for quantile mixture is constrained to risk measures consistent with convex order. As ES p isa special case of risk measures consistent with convex order, we immediately get ES p ( F ) ES p (Λ F ) andES p ( F ) = ES p (Λ ⊗ F ). Since VaR is not consistent with convex order, (ii) of Proposition 5 cannot beapplied to VaR. Nevertheless, using a recent result on VaR in Blanchet et al. (2020), we obtain an inequalitybetween VaR for some special marginals and VaR of their corresponding quantile mixture. Denote by M D (respectively, M I ) the set of distributions with decreasing (respectively, increasing) densities on their support. Theorem 3.

For p ∈ (0 , , Λ ∈ Q n , and F ∈ M nD ∪ M nI , we have VaR p ( F ) VaR p (Λ ⊗ F ) . Proof.

We start with some preliminaries. Deﬁne the upper VaR at level p for a cdf F asVaR ∗ p ( F ) = inf { x ∈ R : F ( x ) > p } , p ∈ (0 , . The worst-case value of the upper VaR in risk aggregation is VaR ∗ p ( F ) = sup { VaR ∗ p ( G ) : G ∈ D n ( F ) } . For F ∈ M nD ∪ M nI and p ∈ (0 , ∗ p ( F ) = VaR p ( F ) . p ( F ) = inf β ∈ B n n X i =1 − p )(1 − β ) Z − (1 − p ) β i p +(1 − p )( β − β i ) VaR u ( F i )d u, (2)where β = ( β , . . . , β n ), β = P ni =1 β i and B n = { β ∈ [0 , n : β < } . Note thatΛ ⊗ F ∈ M nD ∪ M nI if F ∈ M nD ∪ M nI . Consequently, for p ∈ (0 , p (Λ ⊗ F ) = inf β ∈ B n n X i =1 − p )(1 − β ) Z − (1 − p ) β i p +(1 − p )( β − β i )  n X j =1 Λ i,j VaR u ( F j )  d u = inf β ∈ B n n X i =1 n X j =1 Λ i,j M i,j ( β ) , where the function M : B n → R n × n , mapping an n -dimensional vector to an n × n matrix, is given by M i,j ( β ) = 1(1 − p )(1 − β ) Z − (1 − p ) β i p +(1 − p )( β − β i ) VaR u ( F j )d u, i, j = 1 , . . . , n. We can rewrite (2) as VaR p ( F ) = inf β ∈ B n n X i =1 M i,i ( β ) . Let Π , . . . , Π n ! be all diﬀerent n -permutation matrices, i.e., Π k β is a permutation of β for each k . ByBirkhoﬀ’s Theorem (Theorem 2.A.2 of Marshall et al. (2011)), for Λ ∈ Q n , there exists ( λ , . . . , λ n ! ) ∈ ∆ n ! such that Λ = P n ! k =1 λ k Π k . Hence, by writing Π k β = ( β k , . . . , β kn ) for each k , we have n X i =1 n X j =1 Λ i,j M i,j ( β ) = 1(1 − p )(1 − β ) n X i =1 Z − (1 − p ) β i p +(1 − p )( β − β i )  n X j =1 Λ i,j VaR u ( F j )  d u = 1(1 − p )(1 − β ) n X i =1 n ! X k =1 λ k Z − (1 − p ) β ki p +(1 − p )( β − β ki ) VaR u ( F i )d u = n ! X k =1 λ k n X i =1 − p )(1 − β ) Z − (1 − p ) β ki p +(1 − p )( β − β ki ) VaR u ( F i )d u = n ! X k =1 λ k n X i =1 M i,i (Π k β ) . p ( F ) = inf β ∈ B n n X i =1 M i,i ( β ) = n ! X k =1 λ k inf β ∈ B n n X i =1 M i,i (Π k β ) inf β ∈ B n n ! X k =1 λ k n X i =1 M i,i (Π k β ) = inf β ∈ B n n X i =1 n X j =1 Λ i,j M i,j ( β ) = VaR p (Λ ⊗ F ) . This completes the proof of the theorem.Inspired by Theorem 3, for Λ ∈ Q n , we may expect ρ ( F ) ρ (Λ ⊗ F ) for other risk measures ρ , and D n ( F ) ⊂ D n (Λ ⊗ F ) for some more general marginal distributions F ∈ M nD other than uniform distributionsas shown in Proposition 2. Unfortunately, we are unable to prove such relationships in general. Some relatedopen questions are listed in Section 9.Next, we study location-scale distribution families of the form, for a given G ∈ M D ∪ M I , (cid:26) G θ,η : G θ,η ( · ) = G (cid:18) · − ηθ (cid:19) , θ ∈ (0 , ∞ ) , η ∈ R (cid:27) . For θ ∈ (0 , ∞ ) n and η ∈ R n , we denote by G θ , η = ( G θ ,η , . . . , G θ n ,η n ). As an immediate consequence ofTheorem 3, we obtain the following result. Corollary 3.

For G ∈ M D ∪ M I , p ∈ (0 , , Λ ∈ Q n , θ ∈ (0 , ∞ ) n and η ∈ R n , we have VaR p ( G θ , η ) VaR p ( G Λ θ , Λ η ) . Proof.

Note that ( G θ,η ) − = η + θG − for θ > η ∈ R . It follows that(Λ ⊗ G θ , η ) − = Λ( G θ , η ) − = ( G Λ θ , Λ η ) − , which means Λ ⊗ G θ , η = G Λ θ , Λ η . Applying Theorem 3, we get the statement in the corollary.Corollary 3 can be made slightly more general by considering both location and scale transforms (seeSection 8.1 for more discussions on location shifts). Let T x ( F ) be a shift of F ∈ M by adding a constant x ∈ R to its location, that is, T x ( F ) is the distribution of X + x for X ∼ F . For x = ( x , . . . , x n ) ∈ R n and F = ( F , . . . , F n ) ∈ M n , we use the notation T x ( F ) = ( T x ( F ) , . . . , T x n ( F n )) . Moreover, for λ >

0, wedenote by F λ the distribution of λX for X ∼ F and write F λ = ( F λ , . . . , F λ n ). Corollary 4.

For p ∈ (0 , , F ∈ M D ∪ M I , λ , γ ∈ R n + , and x , y ∈ R n , if γ ≺ λ and P ni =1 x i P ni =1 y i ,then VaR p ( T x ( F λ )) VaR p ( T y ( F γ )) . (3) Proof.

By Section 1.A.3 of Marshall et al. (2011), γ ≺ λ if and only if there exists Λ ∈ Q n such that γ = Λ λ .12his implies F γ = Λ ⊗ F λ . By Corollary 3, it follows that VaR p ( F λ ) VaR p ( F γ ) . Moreover, observe thatVaR p ( T x ( F λ )) = VaR p ( F λ ) + n X i =1 x i and VaR p ( T y ( F γ )) = VaR p ( F γ ) + n X i =1 y i . By the fact that P ni =1 x i P ni =1 y i , we prove (3). In this section we study the worst-case risk measure for a portfolio of Pareto risks, and the risk measureis not necessarily consistent with convex order. Throughout this section, we assume that ρ is a monotonerisk measure, such as VaR.One particular situation of interest for risk aggregation with non-convex risk measures is when the risksin the portfolio do not have a ﬁnite mean. Note that for a portfolio without ﬁnite mean, any non-constantrisk measure that is consistent with convex order (including convex risk measures) will have an inﬁnite value.Therefore, one has to use a non-convex risk measure such as VaR to assess risks in this situation.Arguably, the most important class of heavy-tailed risk distributions is the class of Pareto distribu-tions due to their regularly varying tails and their prominent appearance in extreme value theory; see e.g.,Embrechts et al. (1997). A common parameterization of Pareto distributions is given by, for θ, α > P α,θ ( x ) = 1 − (cid:18) θx (cid:19) α , x > θ. Note that if X ∼ P α, , then θX ∼ P α,θ , and thus θ is a scale parameter. Moreover, the mean of P α,θ isinﬁnite if and only if α ∈ (0 , α and possibly diﬀerent θ .For α > θ = ( θ , . . . , θ n ) ∈ (0 , ∞ ) n , let P α, θ = ( P α,θ , . . . , P α,θ n ). We are interested in the worst-case value ρ ( P α, θ ). We ﬁrst note some simple properties of the above quantity, which are straightforward tocheck (a simple proof is put in Appendix A.3). Proposition 6.

Let ρ be a monotone risk measure on M . For α > and θ ∈ (0 , ∞ ) n ,(i) Λ ⊗ P α, θ = P α, Λ θ for all Λ ∈ (0 , ∞ ) n × n ;(ii) ρ ( P α, θ ) is decreasing in α ;(iii) ρ ( P α, θ ) is increasing in each component of θ . The next result contains an ordering relationship on the aggregation of Pareto risks. In particular, weshow that for α ∈ (0 , α >

1; see the ﬁgures in Section 6). This result is not implied by any comparisons obtained in13he previous sections, and it seems to be rather specialized for Pareto distributions, as seen from the proof.It is unclear at the moment whether the result can be generalized to other types of distributions without aﬁnite mean.

Theorem 4.

Let ρ be a monotone risk measure on M . For α ∈ (0 , , θ = ( θ , . . . , θ n ) ∈ (0 , ∞ ) n , and Λ ∈ Q n , we have ρ ( P α, θ ) ρ (Λ P α, θ ) ρ ( P α, Λ θ ) . Proof.

The ﬁrst inequality follows directly from Theorem 1. Next we focus on the second inequality. Recallthat Λ = ( λ , . . . , λ n ) ⊤ ∈ Q n and let λ j = ( λ j, , . . . , λ j,n ) for j = 1 , . . . , n . For any ﬁxed j ∈ { , . . . , n } ,denote the cdf of (Λ P α, θ ) j by F j , then F j ( x ) = n X i =1 λ j,i (cid:18) − (cid:18) θ i x (cid:19) α (cid:19) + , x ∈ R . For some ﬁxed x > α ∈ (0 , g ( t ) := 1 − ( t/x ) α , t >

0. Note that g is a convex function on[0 , ∞ ). Hence F j ( x ) > n X i =1 λ j,i (cid:18) − (cid:18) θ i x (cid:19) α (cid:19) > − (cid:18) P ni =1 λ j,i θ i x (cid:19) α . This implies F j ( x ) > G j ( x ) , x > , where G j = ( P α, Λ θ ) j . Let U be the set of uniform random variables on [0 , F j > G j for j = 1 , . . . , n and ρ is monotone, ρ (Λ P α, θ ) = sup ( ρ n X i =1 F − i ( U i ) ! | U , . . . , U n ∈ U ) sup ( ρ n X i =1 G − i ( U i ) ! | U , . . . , U n ∈ U ) = ρ ( P α, Λ θ ) . This completes the proof.Next, we combine the results of Theorems 3-4 and Proposition 5 with a special focus on VaR p , p ∈ (0 , Proposition 7.

For p ∈ (0 , , θ = ( θ , . . . , θ n ) ∈ (0 , ∞ ) n , and Λ ∈ Q n ,(i) If α ∈ (0 , ∞ ) , VaR p ( P α, θ ) VaR p (Λ P α, θ ); (ii) If α ∈ (0 , ∞ ) , VaR p ( P α, θ ) VaR p ( P α, Λ θ ); (iii) If α ∈ (0 , , VaR p ( P α, θ ) VaR p (Λ P α, θ ) VaR p ( P α, Λ θ ) . Proposition 7 is useful for the application in Section 7 on multiple hypothesis testing, where P r followsa Pareto distribution for a p-value P and r <

0. Some further properties of VaR p ( P α, θ ) are put in AppendixA.4. 14 Numerical illustration

Deﬁne a 3 × . × I + 0 . × (cid:18) (cid:19) × , where I is the 3 × { Λ k } k ∈ N to numerically illustrate the ordering relationships and inequalities obtained throughout the paper.Note that Λ k is more “homogeneous” as k grows larger, and Λ k → ( ) × as k → ∞ . The general messagesobtained from the numerical examples are listed as follows.1. For general marginals, the value of VaR becomes larger after making a distribution mixture, ((i) ofProposition 5); this is shown in all ﬁgures.2. For marginal distributions with monotone densities, with a quantile mixture, the value of VaR becomeslarger (Theorem 3); see Figures 1-3. Numerical examples in Figure 4 indicate that Theorem 3 may alsohold for the marginal distributions with non-monotone densities. Nevertheless, the order does not holdfor arbitrary marginal distributions. A counterexample is provided in Figure 5, which involves discretemarginals.3. For Pareto distributions with inﬁnite mean, the value of VaR of the quantile mixture is larger thanthat of the distribution mixture ((iii) of Proposition 7); see Figure 1(b). This conclusion also holds formany other marginals; see all the other ﬁgures except Figure 1(a). This relationship does not hold forPareto distributions with ﬁnite mean; see Figure 1(a). In Figure 1, we consider Pareto distributions with ﬁnite mean ( α = 3) and inﬁnite mean ( α = 1 / α are visualized as the curves in Figure 1 are all increasing in k . In Figure 1(b), it turns out that for thecase with inﬁnite mean the quantile mixture gives larger value of VaR than that given by the distributionmixture. This coincides with the conclusion in (iii) of Proposition 7. Interestingly, we observe from Figure1(a) that the value of VaR given by distribution mixture is larger than the one with quantile mixture, whichis contrary to the case with inﬁnite mean (Figure 1(b)). It is an open question whether this conclusion istrue for general doubly stochastic matrices Λ and all α > α in Figure 2. First observe that the curves ofquantile mixture and distribution mixture in Figure 2 are both increasing in k , which is consistent withTheorem 3 and (i) of Proposition 5. Comparing the two curves, it is shown that value for the distributionmixture in this case is smaller than the one for quantile mixture.15 k V a R . quantile mixturedistribution mixture (a) Pareto distribution with ﬁnite mean ( α = 3) k V a R . quantile mixturedistribution mixture (b) Pareto distribution with inﬁnite mean ( α = 1 / Figure 1: Quantile mixture: VaR p (Λ k ⊗ P α, θ ) = VaR p ( P α, Λ k θ ); Distribution mixture: VaR p (Λ k P α, θ ).Setting: p = 0 . θ = (1 , , X i ∼ Pareto( α, θ i ), i = 1 , , k = (cid:0) . · I + 0 . · ( ) × (cid:1) k , k =0 , , . . . , k V a R . quantile mixturedistribution mixture k distribution mixture Figure 2: Quantile mixture: VaR p (Λ k ⊗ F ); Distribution mixture: VaR p (Λ k F ). Setting: p = 0 . α =(1 / , , θ = (1 , , X i ∼ Pareto( α i , θ i ), i = 1 , , k = (cid:0) . · I + 0 . · ( ) × (cid:1) k , k = 0 , , , , , , k V a R . quantile mixturedistribution mixture k distribution mixture Figure 3: Quantile mixture: VaR p (Λ k ⊗ F ); Distribution mixture: VaR p (Λ k F ). Setting: p = 0 . X ∼ Pareto(1 / , X ∼ Γ(1 , X ∼ Weibull(1 , /

2) and Λ k = (cid:0) . · I + 0 . · ( ) × (cid:1) k , k = 0 , , , , , , Explicit expressions for VaR p ( F ) are unavailable for general marginal distributions. Fortunately, we canapproximate the value of VaR p ( F ) using the rearrangement algorithm (RA) of Embrechts et al. (2013) andget an upper bound on VaR p ( F ) using (5) in Lemma 3.For distributions with non-monotone densities including Gamma and Weibull, the curves of both distri-bution and quantile mixtures in Figure 4 are increasing in k . The result on distribution mixture is consistentwith (i) of Proposition 5, and the result on quantile mixture seems to suggest that the conclusion in Theorem3 may be valid for more general distributions with non-monotone densities. This conjectured extension ofTheorem 3 would hold if (2) holds for more general distributions, which is a diﬃcult question.The above observation is no longer true for discrete distributions. We observe in Figure 5 that the curveof the quantile mixture is not increasing at some points. This shows that the claim in Theorem 3 cannot beextended to arbitrary, in particular discrete, distributions.17 k V a R . quantile mixturedistribution mixture (a) X ∼ Pareto(3 , k V a R . quantile mixturedistribution mixture (b) X ∼ LogNormal(0 , Figure 4: Quantile mixture: VaR p (Λ k ⊗ F ); Distribution mixture: VaR p (Λ k F ). Setting: p = 0 . X ∼ Γ(5 , X ∼ Weibull(1 ,

5) and Λ k = (cid:0) . · I + 0 . · ( ) × (cid:1) k , k = 0 , , , , , , k V a R . quantile mixturedistribution mixture Figure 5: Quantile mixture: VaR p (Λ k ⊗ F ). Setting: p = 0 . X ∼ Binomial(10 , . X ∼ Γ(5 , X ∼ Weibull(1 ,

5) and Λ k = (cid:0) . · I + 0 . · ( ) × (cid:1) k , k = 0 , , , , , , Application: merging p-values in hypothesis testing

In this section, we apply our results to p-merging methods following the setup of Vovk and Wang(2020). A random variable P is a p-variable if P ( P ε ) ε for all ε ∈ (0 , n p-variables P , . . . , P n , one needs to choose an increasing Borel function F : [0 , n → [0 , ∞ ) as a merging function such that F ( P , . . . , P n ) is a p-variable. F is a precise mergingfunction if for each ε ∈ (0 , P ( F ( P , . . . , P K ) ε ) = ε for some p-variables P , . . . , P K .As explained in Vovk and Wang (2020), an advantage of using averaging methods to combine p-values,compared to classic methods on order statistics, is that we can introduce weights to p-values in an intuitiveway. Without imposing any dependence assumption on the individual p-variables, an averaging method uses,for r ∈ [ −∞ , ∞ ] ( r ∈ {−∞ , , ∞} are interpreted as limits), F : [0 , n → [0 , ∞ ) , ( p , . . . , p n ) a r, w ( w p r + · · · + w n p rn ) r , as the merging function, where a r, w is a constant multiplier and w = ( w , . . . , w n ) ∈ ∆ n . The constant a r, w is chosen so that F is a precise merging function, thus the most powerful choice of the constant multiplier.Let U be the set of uniform random variables distributed on [0,1]. Lemma 1 in Vovk and Wang (2020) gives a r, w =  − sup { q ( − P ni =1 w i P ri ) | P , . . . , P n ∈ U} − /r , r > { q ( P ni =1 w i log(1 /P i )) | P , . . . , P n ∈ U} ) , r = 0;sup { q ( P ni =1 w i P ri ) | P , . . . , P n ∈ U} − /r , r < , where q : X inf { x ∈ R : P ( X x ) > } is the essential inﬁmum. Clearly, a r, w involves calculatingVaR p ( F ) for Pareto, exponential or Beta distributions, and letting p ↓ a r, w by a r,n where w = (1 /n, . . . , /n ). Analytical results for a r,n has been well studiedin Vovk and Wang (2020) whereas results for a r, w are limited since there are no analytical formulas ofVaR p ( F ) in general for heterogeneous marginal distributions. Although the rearrangement algorithm ofPuccetti and R¨uschendorf (2012) and Embrechts et al. (2013) can be used to calculate a r, w numerically, thecalculation burden becomes quite heavy in high-dimensional situation which is unfortunately very commonin multiple hypothsis testing. It turns out that our Theorem 3 is helpful to provide a convenient upper boundon a r, w . Proposition 8.

For r ∈ R , we have a r, w a r,n . Proof.

Note that for r < P ri , i = 1 , . . . , n , has a decreasing density, and (1 /n, . . . , /n ) ≺ ( w , . . . , w n ) inmajorization order. By letting p ↓ ( q n X i =1 w i P ri ! | P , . . . , P n ∈ U ) sup ( q n X i =1 n P ri ! | P , . . . , P n ∈ U ) . a r, w a r,n for r <

0. If r >

0, the argument can be proved similarly using Corollary 4.The interpretation of Proposition 8 is that, when using a weighted p-merging method, one can safelyrely on the same coeﬃcient obtained from a symmetric p-merging method. This is particularly convenientwhen validity of the test is more important than the quality of an approximation; see Vovk and Wang (2020)for more discussions on such applications.

In this section we discuss the diﬀerence between distribution and quantile mixtures when location shiftsare applied. Let V x = { ( x , . . . , x n ) ∈ R n : x + · · · + x n = x } for x ∈ R . For F ∈ M n and x ∈ V x , we havethe invariance relation D n ( T x ( F )) = T x ( D n ( F )) . (4)The aggregation set of quantile mixture is invariant under location shifts of the marginal distributions, insharp contrast to the case of distribution mixture. For F ∈ M n and x ∈ V x , it holds that for Λ ∈ Q n , D n (Λ ⊗ T x ( F )) = T x ( D n (Λ ⊗ F )) . That means, D n (Λ ⊗ T x ( F )) is the same for all x ∈ V x . However, this does not hold for the distributionmixture, that is, generally, D n (Λ T x ( F )) is not the same for x ∈ V x , and D n (Λ T x ( F )) = T x ( D n (Λ F )) . In particular, for x = 0 and F = F , D (cid:18)

12 ( T x ( F ) + F ) ,

12 ( T x ( F ) + F ) (cid:19) = D (cid:18)

12 ( F + T x ( F )) ,

12 ( F + T x ( F ))) (cid:19) . The above example shows that distribution mixture and quantile mixtures treat location shifts diﬀerently.Inspired by the above observation, we slightly generalize Theorem 1 by including location shifts. For F ∈ M n , we deﬁne the set A n ( F ) of averaging and location shifts of F as A n ( F ) = { Λ T x ( F ) : Λ ∈ Q n , x ∈ R n , x + · · · + x n = 0 } , and denote by A n ( F ) the closure of the convex hull of A n ( F ) with respect to weak convergence. It isstraightforward to check A n ( T y ( F )) = T y (cid:16) A n ( F ) (cid:17) , y = ( y, . . . , y ) ∈ R n . roposition 9. For F ∈ M n and G ∈ A n ( F ) , we have D n ( F ) ⊂ D n ( G ) . Proof.

First, by Theorem 1 and (4), D n ( F ) ⊂ D n ( G ) for each G ∈ A n ( F ). Denote by cx( A n ( F )) the convexhull of A n ( F ). By Lemma 1 (ii-b), for each G ∈ cx( A n ( F )), we have D n ( F ) ⊂ D n ( G ). Take G ∈ A n ( F ), andwrite it as the limit of { G k } ∞ k =1 ⊂ cx( A n ( F )). It follows that for any F ∈ D n ( F ), F is also in D n ( G k ). Thisimplies F is also in D n ( G ) by the compactness property in Theorem 2.1 (vii-b) of Bernard et al. (2014). Joint mixability (Wang et al. (2013) and Wang and Wang (2016)) is a central concept in the study ofrisk aggregation with dependence uncertainty, and analytical results are quite limited. In this section, westudy the implication of our results on conditions for joint mixability. We denote by δ x the point mass at x ∈ R . Deﬁnition 1 (Joint mixability) . An n -tuple of distributions F ∈ M n is jointly mixable (JM) if D n ( F )contains a point mass distribution δ x , where x ∈ R is called a center of F .Example 1 implies a conclusion on the joint mixability of Bernoulli distributions. Proposition 10.

For p , . . . , p n ∈ [0 , , ( B p , . . . , B p n ) is jointly mixable if and only if P ni =1 p i is an integer.Proof. The “only-if” part is trivial since the sum of Bernoulli random variables takes value in integers. Toshow the “if” part, let k = P ni =1 p i and k ∈ { , } n be a vector whose ﬁrst k entries are 1 and the remainingentries are 0. It is clear that p ≺ k (see Section 1.A.3 of Marshall et al. (2011)). Hence, from Example 1, { δ k } = D n ( B , . . . , B | {z } k , B , . . . , B | {z } n − k ) ⊂ D n ( B p , . . . , B p n ) . Therefore ( B p , . . . , B p n ) is jointly mixable.The set A n ( F ) can also be used to obtain joint mixability of some tuples of distributions. In particular,we shall see in the following proposition that A n ( δ , . . . , δ ) is the set of all jointly mixable tuples with center0. Proposition 11.

For G ∈ M n , the following statements are equivalent.(i) G is jointly mixable.(ii) G ∈ A n ( δ c , . . . , δ c ) for some c ∈ R .(iii) G ∈ A n ( F ) for some F ∈ M n which is jointly mixable.Proof. (ii) ⇒ (iii) is trivial. (iii) ⇒ (i): Suppose that G ∈ A n ( F ) and F is jointly mixable with center x ∈ R .By Proposition 9, we have { δ x } ⊂ D n ( F ) ⊂ D n ( G ) . This shows G is jointly mixable. Next, we show (i) ⇒ (ii).Suppose that G is jointly mixable, and without loss of generality we can assume it has center 0. By deﬁnition,21here exists a random vector X = ( X , . . . , X n ) such that X i ∼ G i and X + · · · + X n = 0. Denote by H thedistribution measure of X . For A ∈ B ( R ) and i = 1 , , . . . , n , G i ( A ) = P ( X i ∈ A ) = Z R n P ( X i ∈ A | X = y ) H (d y ) = Z R n δ y i ( A ) H (d y ) , and as a consequence, G ( A ) = ( G ( A ) , . . . , G n ( A )) = Z R n ( δ y ( A ) , . . . , δ y n ( A )) H (d y ) . Noting that H is supported in V = { ( y , . . . , y n ) ∈ R n : y + · · · + y n = 0 } , we have G ( A ) = Z V ( δ y ( A ) , . . . , δ y n ( A )) H (d y ) = Z V T y ( δ ( A ) , . . . , δ ( A )) H (d y ) . Hence, we conclude that G ∈ A n ( δ , . . . , δ ) . The set A n ( δ c , . . . , δ c ) is quite rich and cannot be analytically characterized. The simple example ofuniform distributions might be helpful to understand Proposition 11. Suppose that F i = U[0 , a i ], a i > i = 1 , . . . , n , and P ni =1 a i > W ni =1 a i . By Theorem 3.1 of Wang and Wang (2016), we know that F is jointlymixable. Then, Proposition 11 implies that every tuple in the set A n ( F ) is jointly mixable.It remains an open question whether it is possible to characterize the set A n ( F ) for uniform randomvariables. This would lead to many classes of jointly mixable distributions including those with monotonedensities and symmetric densities; see Wang and Wang (2016). This paper studies the ordering relationship for aggregation sets where the marginal distributions fordiﬀerent sets are connected by either a distribution mixture or a quantile mixture. For general marginaldistributions, the aggregation set becomes larger after making a distribution mixture on the marginal risks,whereas the aggregation sets are not necessarily comparable in general by a quantile mixture on the marginalrisks. Nevertheless, we obtain several useful results especially on the comparison of VaR aggregation, whichhas applications in and outside ﬁnancial risk management.Although the marginal distributions are assumed known in our main setting, this assumption is notessential for the interpretation of our results in practical situations. In case both marginal uncertainty anddependence uncertainty are present, our results can be directly applied to obtain ordering relationships, aswe explain below. Suppose that Λ ∈ Q n and F ⊂ M n is a set of possible marginal models, representinguncertainty on the marginal distributions. In this case, the set of all possible distributions of aggregate risk is S F ∈F D n ( F ), and the worst-case value of a risk measure ρ is sup { ρ ( G ) : G ∈ D n ( F ) , F ∈ F} = sup F ∈F ρ ( F ) . [ F ∈F D n ( F ) ⊂ [ F ∈F D n (Λ F ) , sup F ∈F ρ ( F ) sup F ∈F ρ (Λ F ) , and, if F ⊂ M nD ∪ M nI , sup F ∈F VaR p ( F ) sup F ∈F VaR p (Λ ⊗ F ) . Thus, our results on set inclusion and risk measure inequalities remain valid in the presence of marginaluncertainty.Many questions on quantile mixtures are still open, and we conclude the paper with four of them. Theﬁrst question concerns whether D n ( F ) ⊂ D n (Λ ⊗ F ) holds for cases other than the uniform distributionsin Proposition 2. As we have seen from Example 2, for F ∈ M n and Λ ∈ Q n , D n ( F ) and D n (Λ ⊗ F ) aregenerally not comparable. It remains open whether D n ( F ) ⊂ D n (Λ ⊗ F ) under some conditions. For instance,Proposition 2 requires n > D n ( F )from Mao et al. (2019). It remains unclear whether the same conclusion holds for n = 2 or other choices ofΛ. The second question concerns decreasing densities (or increasing densities). A concrete conjecture ispresented below, which is inspired by Theorem 3. It is unclear how to formulate natural classes of distributionsother than M D (or M I ) such that similar statements can be expected. Conjecture 1.

For Λ ∈ Q n and F ∈ M nD , we have D n ( F ) ⊂ D n (Λ ⊗ F ). Weaker versions of this conjectureare:(i) For F ∈ M D , and λ , γ ∈ R n + , if γ ≺ λ , then D n ( F λ , . . . , F λ n ) ⊂ D n ( F γ , . . . , F γ n ).(ii) For F , . . . , F n ∈ M D , D n ( F , . . . , F n ) ⊂ D n ( F, . . . , F ) where F − = n P ni =1 F − i .(iii) For F ∈ M D and ( λ , . . . , λ n ) ∈ ∆ n , D n ( F nλ , . . . , F nλ n ) ⊂ D n ( F, . . . , F ).It is obvious that the main statement in Conjecture 1 implies (i) by noting that one can choose Λ suchthat γ = Λ λ and it implies (ii) by choosing Λ = ( n ) n × n . Both (i) and (ii) imply (iii). An example is providedbelow to illustrate the connection of Conjecture 1 to joint mixability. Example 3.

We make a connection of Conjecture 1 to Theorem 3.2 of Wang and Wang (2016), which saysthat for F i ∈ M D with essential support [0 , b i ], i = 1 , . . . , n , D n ( F , . . . , F n ) contains a point mass if andonly if the mean-length condition holds, that is, n X i =1 µ i > max i =1 ,...,n b i where µ i is the mean of F i , i = 1 , . . . , n . For Λ ∈ Q n and F ∈ M nD , let (ˆ µ , . . . , ˆ µ n ) be the mean vector of23 ⊗ F . Note that n X i =1 ˆ µ i = ⊤ n Λ µ = ⊤ n µ = n X i =1 µ i , where n = (1 , . . . , ∈ R n . On the other hand, each component of Λ ⊗ F has a shorter or equal length ofsupport than the maximum length of F . As a consequence, if the mean-length condition holds for F , then italso holds for Λ ⊗ F . Therefore, if D n ( F ) contains a point mass, then so does D n (Λ ⊗ F ); on the contrary,if D n (Λ ⊗ F ) contains a point mass, D n ( F ) does not necessarily contains a point mass, since it may have alonger length of the maximum support. This, at least intuitively, suggests that D n ( F ) ( D n (Λ ⊗ F ) mayhold, as in Conjecture 1.The third question is about the order of VaR for quantile mixture. Our numerical results in Figure 4suggest that the VaR relation VaR p ( F ) VaR p (Λ ⊗ F )holds for more general choices of F than the ones in Theorem 3. We are not sure what general conditions on F will guarantee this relation to hold.The last question concerns a cross comparison of distribution and quantile mixtures. As we see fromProposition 7, VaR p (Λ F ) VaR p (Λ ⊗ F )holds for F being a vector of Pareto distributions with the same shape parameter and inﬁnite mean. Wewonder whether the same relationship holds for other distributions without a ﬁnite mean. Note that for thecase of ﬁnite mean, the relationship may be reversed, as illustrated in Figure 1; however we do not have aproof for the reverse inequality (assuming ﬁnite mean) either. Generally, it is unclear to us whether and inwhich situation D n (Λ F ) and D n (Λ ⊗ F ) are comparable. Acknowledgements

Y. Liu is ﬁnancially supported by the China Scholarship Council. R. Wang acknowledges ﬁnancialsupport from the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2018-03823, RGPAS-2018-522590) and from the Center of Actuarial Excellence Research Grant from the Societyof Actuaries.

A Some proofs and further technical results

A.1 A lemma used in the proof of Theorem 3

The following lemma is rephrased from Theorem 2 of Blanchet et al. (2020).24 emma 3.

For p ∈ (0 , and any F = ( F , . . . , F n ) ∈ M n , VaR ∗ p ( F ) inf β ∈ B n n X i =1 − p )(1 − β ) Z − (1 − p ) β i p +(1 − p )( β − β i ) VaR u ( F i )d u, (5) where β = ( β , . . . , β n ) , β = P ni =1 β i and B n = { β ∈ [0 , n : β < } , and the above inequality is an equalityif F ∈ M nD ∪ M nI . A.2 Proof of Proposition 4

Proof.

We ﬁrst focus on (i). We will show (a) ⇔ (c). (c) ⇒ (a) is trivial by the deﬁnition of stochastic order.For (a) ⇒ (c), note that Λ F ≺ st F with Λ = (Λ ij ) implies n X j =1 Λ ij F j ( x ) > F i ( x ) , x ∈ R , i = 1 , . . . , n. (6)Adding all the inequalities in (6) yields n X i =1 n X j =1 Λ ij F j ( x ) > n X i =1 F i ( x ) , x ∈ R . Due to the fact that Λ is a doubly stochastic matrix, we have n X i =1 n X j =1 Λ ij F j ( x ) = n X i =1 F i ( x ) , x ∈ R . Hence all the inequalities in (6) are essentially equalities. This proves (c). We can analogously show that (b) ⇔ (c). This establishes the claims in (i). We will omit the proof of (ii) since it is similar to the proof of (i).We next focus on (iii). Trivially, (c) ⇒ (a) and (c) ⇒ (b). Next, we will only show (a) ⇒ (c) since (b) ⇒ (c) is similar. Denote by G = ( G , . . . , G n ) = Λ ⊗ F . Hence G − i = n X j =1 Λ ij F − j . By deﬁnition, Λ ⊗ F ≺ cx F implies G i ≺ cx F i , i = 1 , . . . , n . It is well known (see e.g., Theorem 3.A.5 ofShaked and Shanthikumar (2007)) that for any two distributions F and G in M , F ≺ cx G ⇔ ES p ( F ) ES p ( G ) for all p ∈ (0 , . (7)Moreover, by the comonotonic-additivity of ES p , we haveES p ( G i ) = n X j =1 Λ ij ES p ( F j ) , i = 1 , . . . , n. p ( G i ) = n X j =1 Λ ij ES p ( F j ) ES p ( F i ) , p ∈ (0 , , i = 1 , . . . , n. (8)Noting that Λ is a doubly stochastic matrix, similarly as in the proof of (i), adding all the inequalities in (8)leads to n X i =1 ES p ( G i ) = n X i =1 n X j =1 Λ ij ES p ( F j ) = n X i =1 ES p ( F i ) , p ∈ (0 , . This implies that the inequalities in (8) are equalities, which means that Λ ⊗ F = F by (7). We completethe proof of (iii).Finally, we consider (iv). (b) ⇒ (a) is trivial. We will show (a) ⇒ (b). By (7), Λ F ≺ cx F is equivalent toES p ( F i ) > ES p  n X j =1 Λ ij F j  , i = 1 , . . . , n. (9)Moreover, by the concavity of ES p on mixtures (e.g., Theorem 3 of Wang et al. (2020)), we haveES p  n X j =1 Λ ij F j  > n X j =1 Λ ij ES p ( F j ) . Therefore, we have ES p ( F i ) > ES p  n X j =1 Λ ij F j  > n X j =1 Λ ij ES p ( F j ) , i = 1 , . . . , n. (10)Adding the inequalities in (10) with noting that Λ is a doubly stochastic matrix yields n X i =1 ES p ( F i ) > n X i =1 ES p  n X j =1 Λ ij F j  > n X i =1 n X j =1 Λ ij ES p ( F j ) = n X i =1 ES p ( F i ) . Hence n X i =1 ES p ( F i ) = n X i =1 ES p  n X j =1 Λ ij F j  , which implies that inequalities in (9) are all equalities. We establish the claim by (7).26 .3 Proof of Proposition 6 Proof. (i) Note that ( P α,θ ) − = θ ( P α, ) − for θ, α >

0. Hence we prove (i) by showing that(Λ ⊗ P α, θ ) − = Λ( P α, θ ) − = P − α, Λ θ . (ii) Let U be the set of uniform random variables on [0 , ρ , we have, for 0 < α < α , ρ ( P α , θ ) = sup n ρ (cid:16) θ U − /α + · · · + θ n U − /α n (cid:17) | U , . . . , U n ∈ U o > sup n ρ (cid:16) θ U − /α + · · · + θ n U − /α n (cid:17) | U , . . . , U n ∈ U o = ρ ( P α , θ ) . This implies that ρ ( P α, θ ) is decreasing in α .(iii) By monotonicity of ρ , we can establish the claim of (iii) similarly as the proof of (ii). A.4 Some further properties of

VaR p ( P α, θ ) Properties of ρ ( P α, θ ) in Proposition 6 can be strengthened for ρ = VaR p . Proposition 12.

For p ∈ (0 , , α > and θ ∈ (0 , ∞ ) n ,(i) VaR p ( P α, θ ) is increasing and continuous in p ;(ii) VaR p ( P α, θ ) is decreasing and continuous in α ;(iii) VaR p ( P α, θ ) is increasing and continuous in each component of θ ;(iv) VaR p ( P α, θ ) is homogenous in θ , that is, for λ > , VaR p ( P α,λ θ ) = λ VaR p ( P α, θ ); (v) If α > , then · θ (1 − p ) /α VaR p ( P α, θ ) αα − × · θ (1 − p ) /α . (11) Proof. (i) As the quantile of Pareto distribution is continuous, by Lemma 4.4 and 4.5 of Bernard et al.(2014), VaR p ( P α, θ ) is continuous in p on (0 , U be the set of uniform random variables distributed on (0 , p ( P α, θ ) = sup n VaR p (cid:16) θ U − /α + · · · + θ n U − /αn (cid:17) : U , . . . , U n ∈ U o = n X i =1 θ i sup { VaR − p ( M α, θ ( U , . . . , U n )) : U , . . . , U n ∈ U} − α , M α, θ ( u , . . . , u n ) = (cid:16) θ u − /α + · · · + θ n u − /αn (cid:17) − α / ( P ni =1 θ i ) − α , u i ∈ (0 ,

1) for i = 1 , . . . , n. Let θ = min( θ / ( P ni =1 θ i )) . With the classic averaging inequalities, for 0 < α < α , M α , θ M α , θ (Hardy et al. (1934), Theorem 16) and θ α M α , θ > θ α M α , θ (Hardy et al. (1934), Theorem 23). Wenote that 0 < M α, θ < p ( P α , θ ) α /α VaR p ( P α , θ ) θ − α /α VaR p ( P α , θ ) α /α . By letting α ↑ α and α ↓ α , we get the continuity of VaR p ( P α, θ ) in α > θ = ( θ , . . . , θ n ) and θ = ( λθ , . . . , θ n ), λ >

0. The monotonicityrelative to θ follows directly from Proposition 6. Using the homogeneity of VaR p ( P α, θ ), which is provedin (iv), and the monotonicity with respect to θ if 0 < λ < λ VaR p ( P α, θ ) VaR p ( P α, θ ) VaR p ( P α, θ ) , otherwise VaR p ( P α, θ ) VaR p ( P α, θ ) λ VaR p ( P α, θ ) . By letting λ ↑ λ ↓

1, we get the desired result.(iv) For λ >

0, VaR p ( P α,λ θ ) = sup { VaR p ( G ) : G ∈ D n ( P α,λ θ ) } = sup n VaR p (cid:16) G (cid:16) · λ (cid:17)(cid:17) : G ∈ D n ( P α, θ ) o = λ sup { VaR p ( G ) : G ∈ D n ( P α, θ ) } = λ VaR p ( P α, θ ) . (v) For α >

1, VaR p ( P α, θ ) ES p ( P α, θ ) = P ni =1 ES p ( P α,θ i ) = α P ni =1 θ i / (cid:0) ( α − − p ) /α (cid:1) , andVaR p ( P α, θ ) > P ni =1 VaR p ( P α,θ i ) = P ni =1 θ i / (1 − p ) /α . References

Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999). Coherent measures of risk.

Mathematical Finance , (3),203–228.Bernard, C., Jiang, X. and Wang, R. (2014). Risk aggregation with dependence uncertainty. Insurance: Mathematicsand Economics , , 93–108.Bernard, C., R¨uschendorf, L. and Vanduﬀel, S. (2017). VaR bounds with variance constraint. Journal of Risk andInsurance , (3), 923–959.Blanchet, J., Lam, H., Liu, Y. and Wang, R. (2020). Convolution bounds on quantile aggregation. arXiv: ai, J., Liu, H. and Wang, R. (2018). Asymptotic equivalence of risk measures under dependence uncertainty. Math-ematical Finance , (1), 29–49.Delbaen, F., Bellini, F., Bignozzi, V. and Ziegel, J. (2016). Risk measures with convex level sets. Finance andStochastics , (2), 433–453.Eckstein, S., Kupper, M. and Pohl, M. (2020). Robust risk aggregation with neural networks. Mathematical Finance ,published online at doi.org/10.1111/mafi.12280 .Embrechts, P., Kl¨uppelberg, C. and Mikosch, T. (1997).

Modelling Extremal Events for Insurance and Finance .Springer, Heidelberg.Embrechts, P., Puccetti, G. and R¨uschendorf, L. (2013). Model uncertainty and VaR aggregation.

Journal of Bankingand Finance , (8), 2750–2764.Embrechts, P., Wang, B. and Wang, R. (2015). Aggregation-robustness and model uncertainty of regulatory riskmeasures. Finance and Stochastics , (4), 763–790.F¨ollmer, H. and Schied, A. (2016). Stochastic Finance. An Introduction in Discrete Time . Fourth Edition. Walter deGruyter, Berlin.Hardy, G. H., Littlewood, J. E., P´olya, G. (1934).

Inequalities . Cambridge University Press.Mao, T., Wang, B. and Wang, R. (2019). Sums of uniform random variables.

Journal of Applied Probability , (3),918–936.Mao, T. and Wang, R. (2020). Risk aversion in regulatory capital calculation. SIAM Journal on Financial Mathe-matics , (1), 169–200.Marshall, A. W., Olkin, I. and Arnold, B. (2011). Inequalities: Theory of Majorization and Its Applications . Springer,2nd edition.McNeil, A. J., Frey, R. and Embrechts, P. (2015).

Quantitative Risk Management: Concepts, Techniques and Tools .Revised Edition. Princeton, NJ: Princeton University Press.M¨uller, A. and Stoyan, D. (2002).

Comparison Methods for Statistical Models and Risks . Wiley, England.Puccetti, G. and R¨uschendorf, L. (2012). Computation of sharp bounds on the distribution of a function of dependentrisks.

Journal of Computational and Applied Mathematics , (7), 1833–1840.Puccetti, G. and R¨uschendorf, L. (2013). Sharp bounds for sums of dependent risks. Journal of Applied Probability, (1), 42–53.Shaked, M. and Shanthikumar, J. G. (2007). Stochastic orders . Springer Series in Statistics.Vovk, V. and Wang, R. (2020). Combining p-values via averaging.

Biometrika , published online at doi.org/10.1093/biomet/asaa027 .Wang, B. and Wang, R. (2016). Joint mixability.

Mathematics of Operations Research , (3), 808–826.Wang, R., Wei, Y. and Willmot, G. E. (2020). Characterization, robustness and aggregation of signed Choquetintegrals. Mathematics of Operations Research , published online at doi.org/10.1287/moor.2019.1020 .Wang, R., Peng, L. and Yang, J. (2013). Bounds for the sum of dependent risks and worst Value-at-Risk withmonotone marginal densities.

Finance and Stochastics , (2), 395–417.Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica , (1), 95–115.Ziegel, J. (2016). Coherence and elicitability. Mathematical Finance , , 901–918., 901–918.