[PDF] Convolution Bounds on Quantile Aggregation

Abstract

Quantile aggregation with dependence uncertainty has a long history in probability theory with wide applications in finance, risk management, statistics, and operations research. Using a recent result on inf-convolution of quantile-based risk measures, we establish new analytical bounds for quantile aggregation which we call convolution bounds. In fact, convolution bounds unify every analytical result and contribute more to the theory of quantile aggregation, and thus these bounds are genuinely the best one available. Moreover, convolution bounds are easy to compute, and we show that they are sharp in many relevant cases. Convolution bounds enjoy several other advantages, including interpretability on the extremal dependence structure, tractability, and theoretical properties. The results directly lead to bounds on the distribution of the sum of random variables with arbitrary dependence, and we illustrate a few applications in operations research.

Full PDF

aa r X i v : . [ q -f i n . R M ] J u l Convolution Bounds on Quantile Aggregation

Jose Blanchet ∗ Henry Lam † Yang Liu ‡ Ruodu Wang § July 21, 2020

Abstract

Quantile aggregation with dependence uncertainty has a long history in probability theory with wideapplications in problems in ﬁnance, risk management, statistics, and operations research. Using a recentresult on inf-convolution of Range-Value-at-Risk, which includes Value-at-Risk and Expected Shortfall asspecial cases, we establish new analytical bounds which we call convolution bounds. These bounds areeasy to compute, and we show that they are sharp in many relevant cases. We pay a special attentionto the problem of quantile aggregation, and the convolution bounds help us to identify approximationsfor the extremal dependence structure. The convolution bound enjoys several advantages, includinginterpretability, tractability and theoretical properties. To the best of our knowledge, there is no othertheoretical result on quantile aggregation which is not covered by the convolution bounds, and thus theconvolution bounds are genuinely the best one available. The results can be applied to compute bounds onthe distribution of the sum of random variables. Some applications to operations research are discussed.

Key-words : quantile aggregation, convolution, model uncertainty, dependence structure, duality

The problem of quantile aggregation with dependence uncertainty refers to ﬁnding possible values ofquantiles of an aggregate risk S = X + · · · + X n with given marginal distributions of X , . . . , X n butunspeciﬁed dependence structure. More precisely, for given marginal distributions µ , . . . , µ n on R , thefollowing quantities are of interest:sup { q t ( X + · · · + X n ) : X i ∼ µ i , i = 1 , . . . , n } (1)and inf { q t ( X + · · · + X n ) : X i ∼ µ i , i = 1 , . . . , n } , (2) ∗ Department of Management Science and Engineering, Stanford University, USA. Email: [email protected] † Department of Industrial Engineering and Operations Research, Columbia University, USA. Email: [email protected] ‡ Department of Mathematical Sciences, Tsinghua University, China. Email: [email protected] § Department of Statistics and Actuarial Science, University of Waterloo, Canada. Email: [email protected] q t ( X ) stands for a (left or right) quantile of a random variable X at probability level t ∈ [0 , P ( S x ) for a given x ∈ R . This problem has a long history in probability theory; see Makarov (1981) and R¨uschendorf (1982)for early results.A key feature of quantile aggregation is “free dependence”, thus having no dependence assumptionat all. Naturally, such a setting may be very conservative in some situations; nevertheless, it has wideappearance in many areas, including operations research, statistics, risk management, and ﬁnance. We onlyname a few examples which are by no means exhaustive. The supremum of the quantile of aggregate riskis extensively studied in the risk management literature, known as the conservative Value-at-Risk (VaR) forcapital calculation; see Embrechts et al. (2013, 2015) and the references therein. The worst-case VaR undervarious settings of model uncertainty is also popular in robust portfolio optimization; see e.g., El Ghaouiet al. (2003), Zhu and Fukushima (2009) and Zymler et al. (2013). In multiple statistical hypothesis testing,quantile aggregation gives critical values for various combination methods of multiple p-values, as it is oftenimpossible to conduct statistical inference on the dependence for a data set of p-values; see e.g., Ramdas etal. (2019) and Vovk and Wang (2020). In operations research, the calculation of the maximum possible lowerend-point (corresponding to quantile at level 0) and the minimum possible upper end-point (correspondingto quantile at level 1) of S , as well as the corresponding optimizers, is known as the problem of assembly linecrew scheduling; see e.g., Coﬀman and Yannakakis (1984). Quantile aggregation techniques have also beenapplied recently in computer networks and wireless communication by Besser and Jorswieck (2020).Despite their innocent appearance, quantile aggregation problems (1) and (2) rarely have analyticalformulas. In the literature, some analytical bounds for the homogeneous setting (i.e., identical marginal dis-tributions) are obtained by Embrechts and Puccetti (2006), Wang et al. (2013) and Puccetti and R¨uschendorf(2013), and approximating algorithms are available such as the rearrangement algorithm (RA) in Puccettiand R¨uschendorf (2012) and Embrechts et al. (2013). Sharpness of these bounds is rarely obtained with theexception of Wang et al. (2013) and Puccetti and R¨uschendorf (2013) under some strong conditions. The RAonly gives a lower bound on the quantile aggregation, and its convergence is not guaranteed. The discreteversion of quantile aggregation is NP-complete; see Coﬀman and Yannakakis (1984).In this paper, we propose a class of bounds based on the inf-convolution of Range-Value-at-Risk (RVaR)introduced by Embrechts et al. (2018), which we call convolution bounds on RVaR aggregation. Since RVaRincludes the two regularity risk measures, VaR and the Expected Shortfall (ES, also known as CVaR), asspecial cases, the results on RVaR give rise to useful bounds on quantile aggregation problems (1) and (2).Our main contributions are summarized below. In Sections 3-4, we establish the (upper) convolutionbounds on RVaR aggregation in Theorem 1 and on quantile aggregation in Theorem 2. We proceed toshow that these bounds are sharp in most relevant cases. In Section 5, we analyze the extremal dependencestructure maximizing quantile aggregation (Theorem 3). Some analytical approximations are proposed whichhave good numerical performance. In Section 6, we study the dual formulation of the quantile aggregation2roblems (Theorem 4). The numerical advantages of the new bounds will be carefully examined in Section7. To better illustrate our main ideas, the lower convolution bounds and related discussions are postponedto Section 8 (in particular, Theorems 5 and 6). In Section 9, the new bounds are applied to provide ananalytical approximation for the assembly line crew scheduling problem. It would be more promising if thereare scheduling problems involving stochastic or continuous settings. Finally, Section 10 concludes the paper.Appendices A-D includes the counter-examples, all proofs and other technical discussions.The convolution bounds in Theorems 2 and 6 provide by far the most convenient theoretical results onquantile aggregation, and they can be applied to any marginal distributions, discrete, continuous, or mixed.To the best of our knowledge, there is no other theoretical result on quantile aggregation which cannotbe covered by our convolution bounds. Although sharpness of these bounds requires some conditions, thenumerical performance suggests that they are generally very accurate even in cases where sharpness cannotbe theoretically proved. As we mentioned above, our results on quantile aggregation can be directly appliedto compute bounds on the distribution of the sum of random variables. Let M be the set of (Borel) probability measures on R and M be the set of probability measures on R with ﬁnite mean. For µ = ( µ , . . . , µ n ) ∈ M n , let Γ( µ ) be the set of probability measures on R n that haveone-dimensional marginals µ , . . . , µ n . For a probability measure µ on R n , deﬁne λ µ ∈ M by λ µ ( −∞ , x ] = µ ( { ( x , . . . , x n ) ∈ R n : x + · · · + x n x } ) , x ∈ R . In other words, λ µ is the distribution measure of P ni =1 X i where the random vector ( X , . . . , X n ) follows µ .Moreover, let Λ( µ ) = { λ µ : µ ∈ Γ( µ ) } . Thus, Λ( µ ) is the set of the aggregate distribution measures withspeciﬁed marginals µ . For t ∈ (0 , q − t ( µ ) = inf { x ∈ R : µ ( −∞ , x ] > t } , µ ∈ M , and for t ∈ [0 , q + t ( µ ) = inf { x ∈ R : µ ( −∞ , x ] > t } , µ ∈ M . The two extreme cases q +0 and q − correspond to the essential inﬁmum and the essential supremum. Notethat q ± t is deﬁned on M instead of on the set of random variables as in the introduction. The most importantobjects in this paper are the average quantile functionals which we deﬁne next. For 0 β < β + α R β,α ( µ ) = 1 α Z β + αβ q +1 − t ( µ )d t, µ ∈ M . (3)3y deﬁnition, R β,α ( µ ) is the average of the quantile of µ over [1 − β − α, − β ]. The functional R β,α ,introduced originally by Cont et al. (2010), is called an RVaR by Wang et al. (2015). The value R α,β ( µ ) in(3) is always ﬁnite for β > α + β <

1, and it may take the value ∞ or −∞ in case one of β = 0 or α + β = 1. For the special case in which β = 0 and α = 1, R , is precisely the mean, and it is only welldeﬁned on the set M of distributions with ﬁnite mean. The left and right quantiles can be obtained aslimiting cases of R β,α for β ∈ (0 ,

1) vialim α ↓ R β,α ( µ ) = q − − β ( µ ) and lim α ↓ R β − α,α ( µ ) = q +1 − β ( µ ) , µ ∈ M . (4)Two other useful special cases are ES and the left-tail ES (LES), deﬁned, respectively, at level α ∈ (0 ,

1) viaES α ( µ ) = R ,α ( µ ) = 1 α Z − α q − u ( µ )d u, µ ∈ M , and LES α ( µ ) = R − α,α ( µ ) = 1 α Z α q − u ( µ )d u, µ ∈ M . As explained by Embrechts et al. (2018), the RVaR functional R bridges the gap between quantiles (VaR)and ES, the two most popular risk measures in banking and insurance.It is sometimes convenient to slightly abuse the notation by using R β,α ( X ) or q t ( X ) for R β,α ( µ ) or q t ( µ )where X ∼ µ . All random variables appearing in the paper live in an atomless probability space (Ω , F , P ).We use W ni =1 α i for the maximum of real numbers α , . . . , α n . Our starting point is that an upper bound on RVaR aggregation, which we shall refer to as convolutionbounds, can be obtained from an inequality on RVaR from Embrechts et al. (2018). More precisely, Theorem2 of Embrechts et al. (2018) gives the following inf-convolution formula R β,α ( X ) = inf ( n X i =1 R β i ,α i ( X i ) : X + · · · + X n = X ) , (5)for any integrable random variable X and α , . . . , α n , β , . . . , β n ∈ [0 ,

1] with β + α

1, where β = P ni =1 β i and α = W ni =1 α i . As a consequence of (5), we have an RVaR aggregation inequality R β,α n X i =1 X i ! n X i =1 R β i ,α i ( X i ) (6) We can use either q + or q − in the integral, as the two quantities are the same almost everywhere on [0 , X , . . . , X n , provided the right-hand side of (6) is well deﬁned (not “ ∞ − ∞ ”). The objective ofEmbrechts et al. (2018) is the risk sharing problem where the aggregate risk X and the preferences of theagents are known (thus, α , . . . , α n , β , . . . , β n are given) and one optimizes P ni =1 R β i ,α i ( X i ) over possibleallocations X , . . . , X n satisfying X + · · · + X n = X .In this paper, we use the reverse direction of (6): we ﬁx µ = ( µ , . . . , µ n ) ∈ M n and t, s with 0 t

1, and aim to ﬁnd the worst-case value of the aggregate risk R t,s ( ν ) over ν ∈ Λ( µ ) using (6). Forany 0 t < t + s β ∈ [ s, t + s ], ν ∈ Λ( µ ), noting that R t,s R t + s − β ,β , (6) leads to R t,s ( ν ) R P ni =1 β i ,β ( ν ) n X i =1 R β i ,β ( µ i ) , (7)where P ni =1 β i = t + s − β . Taking a supremum among all ν ∈ Λ( µ ) and an inﬁmum among all feasible( β , β , . . . , β n ) in (7), we get, for any ﬁxed ( t, s ) with 0 t < t + s ν ∈ Λ( µ ) R t,s ( ν ) inf P ni =0 β i = t + sβ > s n X i =1 R β i ,β ( µ i ) . (8)The right-hand side of (8) depends only on the marginal distributions µ , . . . , µ n and ( t, s ), and thus weobtain a novel upper bound on the worst-case RVaR aggregation. We shall refer to the bound in (8) as a convolution bound , since it is obtained from the inf-convolution formula in (5). To simplify notation, for each n ∈ N , let ∆ n = ( ( β , β , . . . , β n ) ∈ (0 , × [0 , n : n X i =0 β i = 1 ) , which is the set of vectors in the standard ( n + 1)-simplex with positive ﬁrst component. In all results, β represents ( β , β , . . . , β n ).We formally present the convolution bound in Theorem 1 below. More importantly, we show thatthis bound is indeed sharp under several sets of conditions, and hence the convolution bounds are useful incalculating worst-case values in risk aggregation problems. The practically relevant case of quantiles ( s ↓ Theorem 1.

Let µ = ( µ , . . . , µ n ) ∈ M n . For any t, s with t < t + s , sup ν ∈ Λ( µ ) R t,s ( ν ) inf β ∈ ( t + s )∆ n β > s n X i =1 R β i ,β ( µ i ) . (9) Moreover, (9) holds as an equality in the following cases:(i) t = 0 ; The inequality in (6) is essentially Theorem 1 of Embrechts et al. (2018) with a condition on integrability. We slightlygeneralize this result to probability measures without ﬁnite means, which will be useful for the generality of results oﬀered inthis paper; see Lemma A.1 in the appendix. Also note that our parameterization is slightly diﬀerent from Embrechts et al.(2018). ii) n ;(iii) each of µ , . . . , µ n admits a decreasing density beyond its (1 − t − s ) -quantile;(iv) P ni =1 µ i (cid:0) q +1 − t − s ( µ i ) , q − ( µ i ) (cid:3) t + s . Case (i) corresponds to the aggregation of ES, which is well known in the literature (e.g., Chapter 8of McNeil et al. (2015)). Case (ii) is based on counter-monotonicity in the tail region, which appears inMakarov (1981) for the case of quantiles, which is generalized to the average quantile problem by Theorem 1.Case (iii) in Theorem 1 is the most useful as decreasing densities are common in many areas of applications,including but not limited to ﬁnance and insurance. Our proof is quite technical, and it relies on advancedresults on robust risk aggregation established in Wang and Wang (2016) and Jakobsons et al. (2016). Case(iv) corresponds to an assumption which allows for a mutually exclusive (see Deﬁnition A.1 in AppendixD.1) random vector following marginal distributions µ , . . . , µ n . Such a situation is not common, but it mayhappen in the context of credit portfolio analysis, where each µ i represents the distribution of loss from adefaultable security, which has a small probability of being positive. The proof for case (iv) is based onexplicit property of a mutual exclusive random vector. Moreover, we will show in Figure 1 (right panel)in Section 7 that the bound (9) is not sharp for marginals with increasing densities, even for homogeneousmarginals.Results that are symmetric to the upper convolution bounds are collected in Section 8. For instance, alower bound on inf ν ∈ Λ( µ ) R t,s ( ν ), which is symmetric to Theorem 1, is given in Theorem 5.Condition (iii) in Theorem 1 involves conditional distributions above a certain quantile. For µ ∈ M and t ∈ [0 , µ t + be the probability measure given by µ t + ( −∞ , x ] = max (cid:26) µ ( −∞ , x ] − t − t , (cid:27) , x ∈ R . The probability measure µ t + is called the t -tail distribution of µ by Rockafellar and Uryasev (2002). In otherwords, µ t + is the distribution measure of the random variable q U ( µ ) where U is a uniform random variableon [ t, µ t + is the distribution measure of µ restricted beyond its t -quantile. For example, thestatement in (iii) that µ admits a decreasing density beyond its (1 − t − s )-quantile is equivalent to the onethat µ (1 − t − s )+ admits a decreasing density. Moreover, by direct computation, for ﬁxed µ ∈ M and t ∈ [0 , R β,α ( µ t + ) = R (1 − t ) β, (1 − t ) α ( µ ) , for all 0 β < β + α q − u ( µ t + ) = q − t +(1 − t ) u ( µ ) , for all u ∈ (0 , . (10)Using (10), we obtain Proposition 1 below based on Theorem 4.1 of Liu and Wang (2020). This resultsis useful in the proof of Theorems 1, and it may be of independent interest. For µ = ( µ , . . . , µ n ) ∈ M n and t ∈ [0 , µ t + = ( µ t +1 , . . . , µ t + n ). 6 roposition 1. For µ = ( µ , . . . , µ n ) ∈ M n , t ∈ [0 , and s ∈ (0 , − t ] , we have sup ν ∈ Λ( µ ) R t,s ( ν ) = sup ν ∈ Λ( µ (1 − t − s )+ ) LES st + s ( ν ) and sup ν ∈ Λ( µ ) q + t ( ν ) = sup ν ∈ Λ( µ t + ) q +0 ( ν ) . Proposition 1 suggests that for the worst-case problems of RVaR aggregation, it suﬃces to consider theone started from quantile level 0, i.e. the LES aggregation. In particular, for the worst-case problems ofquantile aggregation, it suﬃces to consider the one at quantile level 0, i.e. the problems sup ν ∈ Λ( µ t + ) q +0 ( ν )for generic choices of µ . This will be the general approach taken in the proofs of our main results. In Theorem 2 below we summarize bounds on sup ν ∈ Λ( µ ) q + t ( ν ). Most cases can be obtained by sending s to 0 and replacing t with 1 − t in Theorem 1, but a notable diﬀerence is that the convolution bounds aresharp for both decreasing and increasing densities and for two types of mutual exclusivity (see DeﬁnitionsA.1-A.2). This is in sharp contrast to the RVaR convolution bounds which are only sharp for decreasingdensities or upper mutual exclusivity (see Figure 1). Results on lower bounds on q − t ( ν ) are put in Section 8.In particular, Theorem 6 is symmetric to Theorem 2. Theorem 2.

For µ ∈ M n and t ∈ [0 , , we have sup ν ∈ Λ( µ ) q + t ( ν ) inf β ∈ (1 − t )∆ n n X i =1 R β i ,β ( µ i ) . (11) Moreover, (11) holds as an equality in the following cases:(i) n ;(ii) each of µ , . . . , µ n admits a decreasing density beyond its t -quantile;(iii) each of µ , . . . , µ n admits an increasing density beyond its t -quantile;(iv) P ni =1 µ i (cid:0) q + t ( µ i ) , q − ( µ i ) (cid:3) − t ;(v) P ni =1 µ i (cid:2) q + t ( µ i ) , q − ( µ i ) (cid:1) − t .Remark . If µ , . . . , µ n have positive densities on their supports, then sup ν ∈ Λ( µ ) q − t ( ν ) = sup ν ∈ Λ( µ ) q + t ( ν )for all t ∈ (0 , q − t ( ν ) or q + t ( ν ) in Theorem 2 is notessential to our discussions. 7n the literature, some sharp bounds on quantile aggregation for decreasing densities are obtained byWang et al. (2013) and Puccetti and R¨uschendorf (2013) in the homogeneous case ( µ = · · · = µ n ) andJakobsons et al. (2016) in the heterogeneous case. For the heterogeneous case, the method of Jakobsonset al. (2016) involves solving a system of ( n + 1)-dimensional implicit ODE (equations (E1) and (E2) ofJakobsons et al. (2016)), which requires a highly complicated calculation. In contrast, our result in Theorem2 gives sharp bounds based on the minimum or maximum of an ( n + 1)-dimensional function.In the homogeneous case µ = · · · = µ n , as an immediate consequence of Theorem 2, we obtain the fol-lowing reduced bounds in which one replaces inf β ∈ (1 − t )∆ n P ni =1 R β i ,β ( µ i ) by a one-dimensional optimizationproblem. We show that, in the homogeneous case, the results in Theorem 2 can be further reduced. Proposition 2 (Reduced convolution bounds) . For µ ∈ M and t ∈ [0 , , we have sup ν ∈ Λ n ( µ ) q + t ( ν ) inf α ∈ (0 , (1 − t ) /n ) nR α, − t − nα ( µ ) = inf α ∈ (0 , (1 − t ) /n ) n − t − nα Z − αt +( n − α q − u ( µ )d u. (12) Moreover, (12) holds as an equality if µ admits a decreasing density beyond its t -quantile. In case µ admits a decreasing density, Proposition 8.32 of McNeil et al. (2015) (reformulated from Wanget al. (2013, Theorem 3.4)) givessup ν ∈ Λ n ( µ ) q + t ( ν ) = n − t − nα Z − αt +( n − α q − u ( µ )d u for some α ∈ [0 , (1 − t ) /n ). Together with (11), we get the sharpness of (12). We ﬁrst comment on the sharpness of the bound in Theorem 2. It is well known that, for a ﬁxed n ,the discrete approximation of sup ν ∈ Λ( µ ) q +0 ( ν ) (i.e., approximating the original problem by an m × n matrixrearrangement problem; see Section 9) is an NP-complete problem; see e.g., Coﬀman and Yannakakis (1984).On the other hand, the corresponding discrete approximation of inf β ∈ ∆ n P ni =1 R β i ,β ( µ i ) involves no morethan m n choices of β , thus in class P. As such, we do not expect that the analytical formula in Theorem 2always gives sharp bounds, similarly to Theorem 1. A counter-example of non-sharpness of the bounds inTheorem 2 is presented in Example A.1 in Appendix A. However, in most cases, the bounds in Theorem 2work quite well, as illustrated by the numerical examples later.In some special cases, the reduced bounds in Proposition 2 are equivalent to those in Theorem 2. Weshall show this does not generally hold (e.g., for some distribution with increasing density) later in Figure 2(right panel) and Example A.2.In the following proposition, we note that sup ν ∈ Λ( µ ) q + t ( ν ) is always attainable as a maximum, which isimplied by Lemma 4.2 of Bernard et al. (2014). 8 roposition 3. For µ ∈ M n and t ∈ [0 , , there exists ν + ∈ Λ( µ ) such that sup ν ∈ Λ( µ ) q + t ( ν ) = q + t ( ν + ) . We next turn to the right-hand side of (11). Because of the continuity of R α,β in α, β ∈ [0 , β ∈ (1 − t )∆ n P ni =1 R β i ,β ( µ i ) for any t ∈ [0 ,

1) is attainable in the closure ∆ n :∆ n = ( ( β , β , . . . , β n ) ∈ [0 , n +1 : n X i =0 β i = 1 ) ;see Appendix B for details.The next proposition suggests that when calculating the supremum of q +0 for the aggregation of non-negative risks, one can safely truncate the marginal distributions at a high threshold. This result is convenientwhen applying several results in the literature formulated for distributions with ﬁnite mean or a compactsupport, including Theorem 1 of Embrechts et al. (2018). For a distribution measure µ ∈ M and a constant m ∈ R , let µ [ m ] be the distribution of X ∧ m where X ∼ µ and x ∧ y stands for the minimum of two numbers x and y . Further denote that µ [ m ] = ( µ [ m ]1 , . . . , µ [ m ] n ) for µ = ( µ , . . . , µ n ) ∈ M n . Proposition 4.

For any distributions µ , . . . , µ n on [0 , ∞ ] , t ∈ [0 , , and m > P ni =1 q +1 − (1 − t ) /n ( µ i ) , we have sup ν ∈ Λ( µ ) q + t ( ν ) = sup ν ∈ Λ( µ [ m ] ) q + t ( ν ) . (13) and Now we restate the speciﬁc cases of quantile aggregation q +0 and q − , where an analogous result toTheorem 2 is used; see Section 8. Proposition 5 (Convolution bounds at levels 0 and 1) . For µ ∈ M n , we have sup ν ∈ Λ( µ ) q +0 ( ν ) inf β ∈ ∆ n n X i =1 R β i ,β ( µ i ) , (14) and inf ν ∈ Λ( µ ) q − ( ν ) > sup β ∈ ∆ n n X i =1 R − β i − β ,β ( µ i ) . (15) The two bounds are both sharp if n , or each of µ , . . . , µ n admits a decreasing (respectively, increasing)density on its support. If µ , . . . , µ n have ﬁnite means, the inequalities in (14) and (15) can be combined into a chain ofinequalities. 9 roposition 6. For µ = ( µ , . . . , µ n ) ∈ M n , we have inf ν ∈ Λ( µ ) q − ( ν ) > sup β ∈ ∆ n n X i =1 R − β i − β ,β ( µ i ) > n X i =1 R , ( µ i ) > inf β ∈ ∆ n n X i =1 R β i ,β ( µ i ) > sup ν ∈ Λ( µ ) q +0 ( ν ) . (16)The tuple of distributions µ ∈ M n is said to be jointly mixable (JM, Wang et al. (2013)) if δ C ∈ Λ( µ )for some C ∈ R ; see Appendix C. Proposition 6 implies that (14) and (15) become sharp if µ ∈ M n is JM.If µ , . . . , µ n do not have ﬁnite means, the relationships in (16) may not hold generally, which is illustratedby Example A.3 in Appendix A. A signiﬁcant advantage of the convolution bounds on the quantile aggregation problem is that we areable to visualize the extremal dependence structure corresponding to the convolution bounds. In view ofProposition 1, for the problems of quantile aggregation, it suﬃces to consider the one at quantile level 0.Hence, we consider this worst-case problem sup ν ∈ Λ( µ ) q +0 ( ν ) in the following content, while the best-caseproblem inf ν ∈ Λ( µ ) q − ( ν ) can be discussed in a similar manner.To ease notation, for µ = ( µ , . . . , µ n ) ∈ M n and β = ( β , β , . . . , β n ) ∈ ∆ n , we denote by R + β ( µ ) = n X i =1 R β i ,β ( µ i ) . (17)Now suppose that the bound (11) is sharp, andmax ν ∈ Λ( µ ) q +0 ( ν ) = inf β ′ ∈ ∆ n R + β ′ ( µ ) = R + β ( µ ) , for some β ∈ ∆ n . The main question here is how the knowledge of β in the above equality helps use toidentify or approximate the corresponding dependence structure attaining max ν ∈ Λ( µ ) q +0 ( ν ). In the following,for 0 α < β µ , we let µ [ α,β ] be the probability measure given by µ [ α,β ] ( −∞ , x ] = (min { µ ( −∞ , x ] , β } − α ) + β − α , x ∈ R . Equivalently, µ [ α,β ] is the distribution measure of the random variable q V ( µ ) where V ∼ U[ α, β ], a uniformrandom variable on [ α, β ]. In particular, µ [ α, = µ α + is the α -tail distribution measure of µ in Section 3.We say that a random vector ( X ∗ , . . . , X ∗ n ) attains the maximum of q +0 for µ = ( µ , . . . , µ n ) ∈ M n if X ∗ ∼ µ , . . . , X ∗ n ∼ µ n and q +0 ( X ∗ + · · · + X ∗ n ) = max ν ∈ Λ( µ ) q +0 ( ν ). The existence of the maximizer ν + ∈ Λ( µ )is guaranteed by Proposition 3.Next, we introduce a special form of random vectors. Fix β = ( β , β , . . . , β n ) ∈ ∆ n and µ =10 µ , . . . , µ n ) ∈ M n . Construct the random vector ( X ∗ , . . . , X ∗ n ) by  X ∗ i = Z i A i + W i A \ A i + Y i A c , i = 1 , . . . , n, where ( A , . . . , A n , A c ) is a partition of Ω and A = ∪ ni =1 A i , and for i = 1 , . . . , n , P ( A i ) = β i , Z i ∼ µ [1 − β i , i , W i ∼ µ [0 , − β − β i ] i , Y i ∼ µ [1 − β − β i , − β i ] i , and P ni =1 Y i = R + β ( µ ) almost surely . (18)The existence of ( X ∗ , . . . , X ∗ n ) satisfying (18) requires some conditions, which will be clear from Theorem 3below. A more explicit construction of (18) is given by  X ∗ i = q − − βi − β U ( µ i ) { U ∈ [0 , − β ) ,K = i } + q − − β − βi − β U ( µ i ) { U ∈ [0 , − β ) ,K = i } + Y i { U ∈ [1 − β , } for each i = 1 , . . . , n , where U, K, ( Y , . . . , Y n ) are independent, U ∼ U[0 , Y , . . . , Y n ) is in (18), and P ( K = i ) = β i − β for i = 1 , . . . , n . (19) Theorem 3.

Suppose that µ = ( µ , . . . , µ n ) ∈ M n and max ν ∈ Λ( µ ) q +0 ( ν ) = R + β ( µ ) for some β ∈ ∆ n . Thereexists a random vector ( X ∗ , . . . , X ∗ n ) of the form (18) attaining the maximum of q +0 for µ . Moreover, if β = 1 , then µ is jointly mixable; if β = 1 , β , . . . , β n > and the minimum of each of the functions h i : (0 , − β ] → R , h i ( u ) = q − − βi − β u ( µ i ) + X j = i q − − β − βj − β u ( µ j ) , i = 1 , . . . , n, (20) is attained at u = 1 − β , then ( X ∗ , . . . , X ∗ n ) in (19) attains the maximum of q +0 for µ . As explained in Section 3, Theorem 3 can be applied to arbitrary quantile levels t by considering theconditional distributions µ (1 − t )+1 , . . . , µ (1 − t )+ n .Theorem 3 gives useful information on the worse-case dependence structure attaining max ν ∈ Λ( µ ) q +0 ( ν )based on our knowledge of minimizer β . In this dependence structure, each µ i , i = 1 , . . . , n , is dividedby three parts: the “left-tail” part [ q +0 ( µ i ) , q − − β − β i ( µ i )), the “body” part [ q − − β − β i ( µ i ) , q − − β i ( µ i )] and the“right-tail” part ( q − − β i ( µ i ) , q − ( µ i )]. Basically, the interpretation can be summarized as “joint mixability”(see Appendix C) and “(approximate) mutual exclusivity”. On one hand, the body part of each µ i locateson the same set A c with probability β . They add up to be constant R + β ( µ ), which means the correspondingaggregate measure ν + has probability β to stay on this minimal level. On the other hand, the right-tailpart of each µ i is located on the set A i and is coupled with left-tail parts of all other µ j , j = i , where sets A , . . . , A n are disjoint.In the homogeneous case ( µ = · · · = µ n ), the condition for optimality of the dependence structure(19) holds for distribution with a decreasing density if β = 1 and β = · · · = β n = − β n . In this case, h = · · · = h n on (0 , − β ]. According to Theorem 3.2 and Proposition 3.4 of Bernard et al. (2014), h isdecreasing on (0 , − β ]. Theorem 3 then shows that the corresponding measure ν + attains the worst-casequantile aggregation. In the heterogeneous case, we give some numerical examples to show the performanceof (19) in Section 7. 11 .2 Suboptimal approximations The extremal structure in (19) involves implicit variables Y , . . . , Y n which add up to a constant. Suchrandom variables are known to exist under some conditions of joint mixability, but they are not easy toexplicitly construct or to simulate except for some very simple cases such as uniform marginal distributions.Below, we give a suboptimal dependence structure as an approximation of (19) without involving ( Y , . . . , Y n ): X ∗ i =  q − − βi − β U ( µ i ) { K = i } + q − − β − βi − β U ( µ i ) { K = i } , if β = 1 ,q − − n U ( µ i ) { K = i } + q − n − n U ( µ i ) { K = i } , if β = 1 , i = 1 , . . . , n, (21)where U, K are given in (19) and we further set P ( K = i ) = n , i = 1 , . . . , n in case β = 1 (i.e., set β i / (1 − β ) = 1 /n ). For ( X ∗ , . . . , X ∗ n ) in (21), it is easy to see that X ∗ i ∼ µ i for each i = 1 , . . . , n , and using h i in (20), the essential inﬁmum of P ni =1 X ∗ i is given bymin i n min x h i ( x ) = min i n min x  q − − βi − β x ( µ i ) + X j = i q − − β − βj − β x ( µ j )  , (22)Since X ∗ , . . . , X ∗ n are obtained by explicit construction, the above inﬁmum (22) serves as a lower bound forsup ν ∈ Λ( µ ) q +0 ( ν ). If β = 1, the ﬁrst-order condition in the optimality of β gives h i (1 − β ) = R + β ( µ ) for i = 1 , . . . , n satisfying β i = 0; see (A.23) in Appendix D.This construction can be further improved as follows. For any β ′ ∈ ∆ n , deﬁne H ( β ′ ) = min i n min x h i ( x ; β ′ ) = min i n min x  q − − β ′ i − β ′ x ( µ i ) + X j = i q − − β ′ − β ′ j − β ′ x ( µ j )  . We solve another n -dimensional optimization problemsup β ′ ∈ ∆ n H ( β ′ ) . (23)The maximum point is denoted by γ = ( γ , γ , . . . , γ n ) ∈ ∆ n . Hence, we have the suboptimal dependencestructure X ∗ i =  q − − γi − γ U ( µ i ) { K = i } + q − − γ − γi − γ U ( µ i ) { K = i } , if γ = 1 ,q − − n U ( µ i ) { K = i } + q − n − n U ( µ i ) { K = i } , if γ = 1 , i = 1 , . . . , n, (24)where U, K are independent, U ∼ U[0 ,

1] and P ( K = i ) = γ i − γ , i = 1 , . . . , n if γ = 1 and P ( K = i ) = n , i = 1 , . . . , n if γ = 1. For ( X ∗ , . . . , X ∗ n ) in (24), it is easy to see that X ∗ i ∼ µ i for each i = 1 , . . . , n and theessential inﬁmum of P ni =1 X ∗ i is H ( γ ).It turns out that (21) gives a good approximation for the maximum value of q +0 in many cases and (24)12oes even better. The numerical performance will be illustrated in Section 7. Note that H ( β ) H ( γ ) sup ν ∈ Λ( µ ) q +0 ( ν ) R + β ( µ ) . (25)As a result, we provide two-side approximation intervals [ H ( β ) , R + β ( µ )] or [ H ( γ ) , R + β ( µ )] for true value ofsup ν ∈ Λ( µ ) q +0 ( ν ). If only β is provided, the former interval can be adopted to approximate the worst-casequantile aggregation; if it is convenient to conduct another optimization (23), the latter one would be moreaccurate in approximation. In this section, we investigate the dual formulation of the quantile aggregation problem. In Theorem 2,the convolution bound (11) is obtained by an n -dimensional optimization problem. The main result in thissection is that the convolution bound (11) is equal to a dual bound (26), with a convenient correspondencebetween the minimizers of both problems.The following proposition gives a dual bound on quantile aggregation, which is essentially Theorem 4.17of R¨uschendorf (2013) which is expressed in terms of probability instead of quantiles. Proposition 7.

For t ∈ [0 , , it holds that(dual bound) sup ν ∈ Λ( µ ) q + t ( ν ) D − n (1 − t ) , (26) where D − n ( α ) = inf { x ∈ R : D n ( x ) < α } , α ∈ (0 , and the function D n : R → R is deﬁned by D n ( x ) = inf r ∈ ∆ n ( x ) ( n X i =1 x − r Z x − r + r i r i µ i ( y, ∞ )d y ) , x ∈ R , (27) where r = ( r , . . . , r n ) , r = P ni =1 r i and ∆ n ( x ) = { ( r , . . . , r n ) ∈ R n : P ni =1 r i < x } . Below we always write r = ( r , . . . , r n ) and r = P ni =1 r i . We ﬁnd that the dual bound (26) is equal toour convolution bound (11) if the marginal distribution and quantile functions are continuous. Theorem 4.

For ﬁxed t ∈ [0 , , let x = D − n (1 − t ) . Suppose that each of µ , . . . , µ n has continuousdistribution and quantile functions. The convolution bound (11) and the dual bound (26) share the samevalue x . Moreover, the correspondence between the minimizers β ∈ ∆ n of (11) and r in the closure of ∆ n ( x ) of (27) is given by: µ i ( −∞ , r i ] = 1 − β − β i , µ i ( −∞ , x − r + r i ] = 1 − β i , i = 1 , . . . , n. (28)Although the convolution bound and the dual bound are generally equal in Theorem 4, we note that theconvolution bound is applicable to RVaR aggregation problems, whereas the dual bound based on probability13s speciﬁc to quantile aggregation. On the computational side, as the set (1 − t )∆ n is bounded and the set∆ n ( x ) is unbounded, optimization of the convolution bound (11) is often easier than that of the dual bound(26). Moreover, (26) needs to additionally compute an inverse function from D n .In the homogeneous case µ = · · · = µ n , Embrechts and Puccetti (2006) derived a (reduced) dual boundfor the worst-case quantile aggregation based on a one-dimensional optimization problem:(reduced dual bound) D − (1 − t ) = inf { x ∈ R : D ( x ) < − t } , (29)where D ( x ) = inf a< xn nx − na Z x − ( n − aa µ (( y, ∞ )) d y, x ∈ R . This dual bound is a special case of (26) by letting r = · · · = r n in (27). Thus, the reduced dual bound (29)is larger than or equal to the dual bound (26), as well as our convolution bound (11) by Theorem 4.Similarly to Theorem 4, one can show that the reduced dual bound (29) is the same as the reducedconvolution bound (12) if the marginal distribution and quantile functions are continuous. In Figure 2 (rightpanel) of Section 7, we give out examples that (11) is strictly smaller than (29). In this section, the convolution bounds in Theorems 1-2 are computed and compared with the existingbounds by numerical examples, including the dual bound of Embrechts and Puccetti (2006) and the rear-rangement algorithm (RA) of Puccetti and R¨uschendorf (2012) and Embrechts et al. (2013). We give somenumerical examples to show the performance of candidate and suboptimal dependence structures (19) and(21) in the heterogeneous case.

For any t, s with 0 t < t + s

1, we numerically compute the RVaR aggregation value R t,s ( ν ), ν ∈ Λ n ( µ ) with diﬀerent methods in the homogeneous case, where the marginal distribution is identical anddenoted by µ . The convolution bound is given by (9) and the true value is approximated by RA. We ﬁx t + s = 0 . s ∈ (0 , .

9) to simulate values of sup ν ∈ Λ n ( µ ) R t,s ( ν ). In Figure 1 (leftpanel), we check Theorem 1 that the convolution bound (9) is sharp for marginals with decreasing densities.In Figure 1 (right panel), we see that the convolution bound (9) is not sharp for marginals with increasingdensities. Although this bound is not sharp for increasing densities, the diﬀerence is small and it performsquite well numerically. Moreover, in Figure 1, the convolution bound (9) is sharp if t = 0 (Theorem 1) and s ↓ Consistent with the literature, we roughly interpreted the upper value produced by RA as a good approximation of the truevalue of the worst-case RVaR, although no convergence result is established; see also Table 1. s R V a R agg r ega t i on ( f i x ed t + s = . ) RAconvolution bound s R V a R agg r ega t i on ( f i x ed t + s = . ) RAconvolution bound

Figure 1: Bounds for sup ν ∈ Λ ( µ ) R +0 . − s,s ( ν ). Left panel: µ = Pareto(1 , /

2) with a decreasing density x − / , x ∈ [1 , ∞ ). Right panel: µ has an increasing density (101 − x ) − , x ∈ [1 , For t ∈ [0 , q + t ( ν ), ν ∈ Λ n ( µ ) with analyticalbounds obtained in the homogeneous case, where the marginal distribution is identical and denoted by µ .Recall that the convolution bound is given by (11), the (reduced) dual bound derived in Embrechts andPuccetti (2006) is given by (29) and the reduced convolution bound is given by (12). The standard bound isderived from the lower Fr´echet-Hoeﬀding bound (see Remark A.29 of F¨ollmer and Schied (2016)). We alsogive the quantile aggregation value under a comonotonic scenario for comparison.Figure 2 (left panel) illustrates that the convolution bound (11), the reduced convolution bound (12)and the dual bound (29) share the same value of quantile aggregation for a Pareto distribution. The standardbound performs worst as an upper bound for sup ν ∈ Λ n ( µ ) q + t ( ν ). The comonotonic scenario serves as a lowerbound. Results for other distributions such as Lognormal and Gamma distribution are similar and we omitthem.In Figure 2 (right panel), we plot analytical bounds of the maximum possible quantile aggregation valuesup ν ∈ Λ ( µ ) q + t ( ν ), where the n -convolution bound (11) achieves a strictly smaller value than the dual bound(29). It means that our bound (11) is an analytically better bound for quantile aggregation. Figure 2 (rightpanel) further shows that (11) is better than the reduced convolution bound (12); see also Example A.2.In Table 1, we numerically check the performance of the bound (11) against RA in more details. RAreturns an interval [ s N , ¯ s N ], whose left and right end-points are relatively similar values if N is suﬃcientlylarge, making it a good approximation for sup ν ∈ Λ( µ ) q +0 ( ν ). Speciﬁcally, the left end-point is always a(numerical) lower bound, while the right end-point is close but not always greater than sup ν ∈ Λ( µ ) q +0 ( ν ).Concerning performance, Figure 2 (right panel) and Table 1 both indicate that the convolution boundand RA have a similar value for most cases. We discuss in three aspects. First, the true value of sup ν ∈ Λ( µ ) q +0 ( ν )15 t Q uan t il e agg r ega t i on comonotonicitystandard bounddual boundreduced convolution boundconvolution bound t Q uan t il e agg r ega t i on RAconvolution bounddual boundreduced convolution bound

Figure 2: Bounds for sup ν ∈ Λ ( µ ) q + t ( ν ). Left panel: µ = Pareto(1 , /

2) with a decreasing density x − / , x ∈ [1 , ∞ ). Right panel: µ has an increasing density (101 − x ) − / , x ∈ [1 , X i ∼ Pareto(1 , α i ) time X i ∼ Pareto(1 , i + 2), i = 1 , . . . ,

20 time α i = 2 + i , n = 20 X i ∼ LogN(5 − i, i ), i = 1 , . . . , n = 60 i = 1 , . . . , X i ∼ Γ( i + 1 , i ), , i = 1 , . . . , N = 10 ) and Convolution bound on sup ν ∈ Λ( µ ) q +0 ( ν )16s only available in cases with monotone densities. According to Theorem 2, the true value equals to theconvolution bound. It is the case of Figure 2 and the ﬁrst model in Table 1. Second, if the true value ofsup ν ∈ Λ( µ ) q +0 ( ν ) is unknown, then we can use the (upper) convolution bound together with the lower boundprovided by the RA to approximately target the true value. As shown in the second model of in Table 1,the diﬀerence between two bounds is quite small and we can approximately know the true value. Third, weshow that in some cases RA does not perform well while the convolution bound provides a sharp result; seeExample 1. Example 1.

Let µ be a triatomic uniform distribution on { , , } . It is easy to see that µ is 3-CM andhence sup ν ∈ Λ( µ,µ,µ ) q +0 ( ν ) = 6. As a result, (11) provides a sharp upper bound that inf β ∈ ∆ R + β ( µ, µ, µ ) = 6with the optimal β = (1 , , , , ν ∈ Λ( µ,µ,µ ) q +0 ( ν ).Concerning computation time, we ﬁnd that the convolution bound (11) is computed quicker than orsimilarly to RA. In conclusion, (11) is not only a good analytical upper bound, but also performs quicklyin numerical calculation for the maximum possible lower end-point.

Recall that in Section 5 we propose a candidate dependence structure (19) for the worst-case quantile ag-gregation. We also state a suboptimal structure (21) without involving Y , . . . , Y n . A better suboptimum (24)is obtained by solving another optimization problem and a two-side approximation interval [ H ( γ ) , R + β ( µ )] isestablished. We now give some numerical examples to compare their corresponding lower end-points in theheterogeneous case with n = 3. As shown in Theorem 3, possible values of the aggregation variable in (19)are those of the functions h , h , h on [0 , − β ], while the corresponding values in (21) are those of h , h , h on [0 , h , h , h on[0 , − β ], while that from (21) is attained at those on [0 , P ni =1 X ∗ i of(19), which is the minimal value of h on [0 , − β ], is just slightly lower than the corresponding q +0 ( ν + ). Wefurther show the numerical values in Table 2, including the convolution bound, the RA results, and valuesfrom the suboptimal structures (21) and (24). Recall that (21) is based on β , while (24) requires solving γ inanother optimization problem. The suboptimal methods give explicit random vectors, and hence it is usefulin visualizing the worst case of quantile aggregation. In Table 2, both (21) and (24) produce numbers closeto the convolution bound in many cases, while (24) is always better but requires more computation. The computation is performed on MATLAB R2017b with Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz. .2 0.4 0.6 0.844.555.56 h h h convolution boundsuboptimum suboptimum h h h convolution boundsuboptimum suboptimum h h h convolution boundsuboptimum suboptimum h h h convolution boundsuboptimum suboptimum Figure 3: Performance of extremal dependence structures with settings in Table 2. µ = Pareto(1,3) µ = Pareto(1,1/3) µ = Pareto(1,3) µ = Pareto(1,3) µ = LogN(0,1) µ = LogN(0,1) µ = LogN(-1,1) µ = LogN(0,1) µ = Γ(1 , µ = Γ(1 , µ = Γ(1 , µ = Γ(3 , ∞ H ( γ ) , R + β ( µ )] [4.1185, 4.2857] [8.055, 8.5936] [3.1254,3.2545] [7.3653,7.634]Bound (11) 4.2857 8.5936 3.2545 7.634Candidate (19) 4.2855 8.4995 3.2545 7.5415Suboptimum (21) 4.0739 7.7835 3.0587 7.2889Suboptimum (24) 4.1185 8.055 3.1254 7.3653 Table 2: Numerical values of lower end-points in Figure 3.18

Lower convolution bounds

In this section, we quickly collect results on lower convolution bounds for inf ν ∈ Λ( µ ) R t,s ( ν ) and inf ν ∈ Λ( µ ) q − t ( ν ),and some related results. The proofs of these results are symmetric to those on the upper convolution bounds,and they are omitted. Theorem 5 (Lower convolution bound on RVaR aggregation) . Let µ = ( µ , . . . , µ n ) ∈ M n . For any t, s with t < t + s , inf ν ∈ Λ( µ ) R t,s ( ν ) > sup β ∈ (1 − t )∆ n β > s n X i =1 R − β i − β ,β ( µ i ) . (30) Moreover, (30) holds as an equality in the following cases:(i) t + s = 1 ;(ii) n ;(iii) each of µ , . . . , µ n admits an increasing density below its (1 − t ) -quantile;(iv) P ni =1 µ i (cid:2) q +0 ( µ i ) , q − − t ( µ i ) (cid:1) − t . Let µ t − be the probability measure given by µ t − ( −∞ , x ] = min (cid:26) µ ( −∞ , x ] t , (cid:27) , x ∈ R . That is, µ t − is the distribution measure of the random variable q V ( µ ) where V is a uniform random variableon [0 , t ]. In the case of Theorem 5 (iii), it equivalently means that each of µ (1 − t ) − , . . . , µ (1 − t ) − n admits anincreasing density. We denote by µ t − = ( µ t − , . . . , µ t − n ). Proposition 8 (symmetric to Proposition 1) showsrelevant results. Proposition 8.

For µ = ( µ , . . . , µ n ) ∈ M n , for t < t + s , we have inf ν ∈ Λ( µ ) R t,s ( ν ) = inf ν ∈ Λ( µ (1 − t ) − ) ES s/ (1 − t ) ( ν ) and inf ν ∈ Λ( µ ) q − t ( ν ) = inf ν ∈ Λ( µ t − ) q − ( ν ) . Similarly to the worst-case values, for the best-case values of RVaR aggregation, it suﬃces to considerthe one ended at quantile level 1, i.e. the ES aggregation. In particular, for the worst-case problems ofquantile aggregation, it suﬃces to consider the one at quantile level 1, i.e. the problems sup ν ∈ Λ( µ t − ) q +1 ( ν )for generic choices of µ . Theorem 6 (Lower convolution bound on quantile aggregation) . For µ ∈ M n , for t ∈ (0 , , we have inf ν ∈ Λ( µ ) q − t ( ν ) > sup β ∈ t ∆ n n X i =1 R − β i − β ,β ( µ i ) . (31)19 oreover, (31) holds as an equality in the following cases:(i) n ;(ii) each of µ , . . . , µ n admits an increasing density below its t -quantile;(iii) each of µ , . . . , µ n admits a decreasing density below its t -quantile;(iv) P ni =1 µ i (cid:2) q +0 ( µ i ) , q − t ( µ i ) (cid:1) t ;(v) P ni =1 µ i (cid:0) q +0 ( µ i ) , q − t ( µ i ) (cid:3) t . Proposition 9 (symmetric to Proposition 2) concerns a reduced lower convolution bound.

Proposition 9.

For µ ∈ M and t ∈ (0 , , we have inf ν ∈ Λ n ( µ ) q − t ( ν ) > sup α ∈ (0 ,t/n ) nR − t +( n − α,t − nα ( µ ) = sup α ∈ (0 ,t/n ) nt − nα Z t − ( n − αα q − s ( µ )d s. (32) Moreover, (32) holds as an equality if µ admits an increasing density below its t -quantile. Proposition 10 (symmetric to Proposition 3) shows that inf ν ∈ Λ( µ ) q − ( ν ) is always attainable and theinﬁmum can be replaced by a minimum. Proposition 10.

For µ ∈ M n and t ∈ (0 , , there exists ν − ∈ Λ( µ ) such that inf ν ∈ Λ( µ ) q − t ( ν ) = q − t ( ν − ) . We apply the convolution bound (15) in Proposition 5 to obtain an analytical method for an applicationin operations research. A typical problem is the so-called assembly line crew scheduling, which is NP-complete; see Hsu (1984). We use an m × n matrix to formulate the problem. There are m assembly lines(rows) and n operations (columns). Each operation has m crews to be assigned to each line. The numberat ( i, j )-position represents the processing time of the i -th crew in the j -th operation. The objective is toappropriately assign crews in each operation to the lines in order to minimize the makespan, that is, themaximum total processing time of all assembly lines. In the matrix form, it is to ﬁnd an optimal arrangementof elements in each column to minimize the maximum row sum. The minimal makespan is inf ν ∈ Λ( µ ) q − ( ν )in this discrete setting, where the convolution bound (15) in Proposition 5 can be applied.Suppose that a matrix is given by the left hand side of (33), with a maximum row sum of 87 + 60 + 83 =230. Let X i be a uniform discrete random variable valued on the i -th column of the matrix and µ i be thecorresponding distribution. For example, X takes each value of the ﬁrst column { , , , , } withprobability . For the discrete distributions µ , µ , µ , the minimal makespan is exactly inf ν ∈ Λ( µ ) q − ( ν ).According to Proposition 5, sup β ∈ ∆ n P ni =1 R − β i − β ,β ( µ i ) serves as a lower bound for the minimal makespanand the maximizer β provides a hint to the optimal scheduling rule. In this example, we have a sharp result:20he bound is sup β ∈ ∆ P i =1 R − β i − β ,β ( µ i ) = 160 and the maximizer β = (0 , . , . , . 

44 10 2466 32 3767 48 4171 57 4387 60 83  = ⇒ 

87 10 4371 60 2467 48 4144 32 8366 57 37  (33)There are several algorithms in the literature for the problem of assembly line crew scheduling. Coﬀmanand Sethi (1976) and Hsu (1984) naturally adopted greedy-type (largest ﬁrst) methods. Coﬀman and Yan-nakakis (1984) improved and developed a row-sum algorithm to approximate the problem. Embrechts et al.(2013) proposed the similar rearrangement algorithm in the context of risk management. These numerical al-gorithms provide an upper bound for the minimal makespan because they always return to a plausible schedul-ing rule. Our proposed analytical method always provides a lower bound sup β ∈ ∆ n P ni =1 R − β i − β ,β ( µ i )according to (15). Therefore, if the outputs of one numerical algorithm and our convolution bound (15) arethe same or similar, we can argue that the scheduling rule of the numerical algorithm is optimal.Furthermore, the convolution bound method can assist to ﬁnd a solution for assembly line crew schedul-ing, when the numerical algorithms above fail to provide an optimal solution. Consider a toy example ofan assembly line crew scheduling problem with the matrix in Example 1. In the following, the RA givesa minimal makespan 1 + 3 + 3 = 7 by (34), whereas the convolution bound method provides a better onesup β ∈ ∆ P i =1 R − β i − β ,β ( µ i ) = 1+3+2 = 6 with the maximizer β = (1 , , ,

0) by (35). Using the extremaldependence structure in Section 5, we can interpret the optimal scheduling rule from the maximizer β .RA:   = ⇒   (34)Convolution bound:   = ⇒   (35)Moreover, the k -partitioning problem is regarded as a similar problem as it can be solved by ﬁndingthe minimal maximum row sum of a matrix; see Boudt et al. (2018). From a diﬀerent perspective, ourconvolution bound provides an analytical assistance for this type of problem. As it is a lower bound for theminimal makespan, if it is equal to the RA results, we can guarantee that the scheduling rule is optimal. Asin the example we only use the discrete distributions, it is promising in application to more complex (e.g.,stochastic or continuous) scheduling problems. 21 Using the convolution result of RVaR of Embrechts et al. (2018), we establish new analytical boundsfor the problem of quantile aggregation, and show that these bounds are sharp in most cases with analyticalformulas in the literature. We can interpret the corresponding worst-case dependence structure and giveexplicit construction for the complicated optimization problem. The convolution bounds cover all existingtheoretical results on quantile aggregation. Moreover, the proposed bound has advantages in its tractability,interpretability, and computation. Given the wide applications of quantile aggregation, the results obtainedin this paper will be helpful in ﬁnance, risk management, statistics and operations research, such as estimatingthe extremal quantile/risk aggregation, robust statistical hypothesis testing, visualization and realization ofthe corresponding dependence structure, and solution of scheduling problems.The level of theoretical diﬃculty in quantile aggregation leaves ample room for future adventures andchallenges. For instance, the sharpness of convolution bounds under general conditions (other than those inTheorems 1, 2, 5 and 6) is an open question. For the interested reader, we connect our results to the theoryof joint mixability in Appendix C, where many questions are known to be open.

References

Bernard, C., Jiang, X. and Wang, R. (2014). Risk aggregation with dependence uncertainty.

Insurance: Mathematicsand Economics , , 93–108.Besser, K. and Jorswieck, E. A. (2020). Reliability bounds for dependent fading wireless channels. IEEE Transactionson Wireless Communications , published online.Boudt, K., Jakobsons, E., and Vanduﬀel, S. (2018). Block rearranging elements within matrix columns to minimizethe variability of the row sums. , , 31–50.Coﬀman, E. G. and Sethi, R. (1976). Algorithms minimizing mean ﬂow time: schedule-length properties. Acta Infor-matica , (1), 1–14.Coﬀman, E. G. and Yannakakis, M. (1984). Permuting elements within columns of a matrix in order to minimizemaximum row sum. Mathematics of Operations Research , (3), 384–390.Cont, R., Deguest, R. and Scandolo, G. (2010). Robustness and sensitivity analysis of risk measurement procedures. Quantitative Finance , (6), 593–606.El Ghaoui, L., Oks, M. and Oustry, F. (2003). Worst-case value-at-risk and robust portfolio optimization: A conicprogramming approach. Operations Research , (4), 543–556.Embrechts, P., Liu, H. and Wang, R. (2018). Quantile-based risk sharing. Operations Research , (4), 936–949.Embrechts, P. and Puccetti, G. (2006). Bounds for functions of dependent risks. Finance and Stochastics , , 341–352.Embrechts, P., Puccetti, G. and R¨uschendorf, L. (2013). Model uncertainty and VaR aggregation. Journal of Bankingand Finance , (8), 2750–2764.Embrechts, P., Puccetti, G., R¨uschendorf, L., Wang, R. and Beleraj, A. (2014). An academic response to Basel 3.5. Risks , (1), 25-48.Embrechts, P., Wang, B. and Wang, R. (2015). Aggregation-robustness and model uncertainty of regulatory riskmeasures. Finance and Stochastics , (4), 763–790. ¨ollmer, H. and Schied, A. (2016). Stochastic Finance. An Introduction in Discrete Time . Walter de Gruyter, Berlin,Fourth Edition.Hsu, W-L. (1984). Approximation algorithms for the assembly line crew scheduling problem.

Mathematics of Opera-tions Research , (3), 376–383.Jakobsons, E., Han, X. and Wang, R. (2016). General convex order on risk aggregation. Scandinavian ActuarialJournal , (8), 713–740.Kusuoka, S. (2001). On law invariant coherent risk measures. Advances in Mathematical Economics , , 83–95.Liu, F. and Wang, R. (2020). A theory for measures of tail risk. Mathematics of Operations Research , forthcoming.Makarov, G. D. (1981). Estimates for the distribution function of the sum of two random variables with given marginaldistributions.

Theory of Probability and its Applications , , 803–806.McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques and Tools .Revised Edition. Princeton, NJ: Princeton University Press.Puccetti, G., Rigo, P., Wang, B. and Wang, R. (2019). Centers of probability measures without the mean.

Journal ofTheoretical Probability , (3), 1482–1501.Puccetti, G. and R¨uschendorf, L. (2012). Computation of sharp bounds on the distribution of a function of dependentrisks. Journal of Computational and Applied Mathematics , (7), 1833–1840.Puccetti, G. and R¨uschendorf, L. (2013). Sharp bounds for sums of dependent risks. Journal of Applied Probability, (1), 42–53.Ramdas, A. K., Barber, R. F., Wainwright, M. J. and Jordan, M. I. (2019). A uniﬁed treatment of multiple testingwith prior knowledge using the p-ﬁlter. Annals of Statistics , (5), 2790–2821.Rockafellar, R. T. and Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Bankingand Finance , (7), 1443–1471.R¨uschendorf, L. (1982). Random variables with maximum sums. Advances in Applied Probability , (3), 623–632.R¨uschendorf, L. (2013). Mathematical Risk Analysis . Springer Series in Operations Research and Financial Engineer-ing, Springer-Verlag Berlin Heidelberg.Vovk, V. and Wang, R. (2020). Combining p-values via averaging.

Biometrika , published online.Wang, B. and Wang, R. (2011). The complete mixability and convex minimization problems with monotone marginaldensities.

Journal of Multivariate Analysis , (10), 1344–1360.Wang, B. and Wang, R. (2016). Joint mixability. Mathematics of Operations Research , (3), 808–826.Wang, R., Bignozzi, V. and Tsakanas, A. (2015). How superadditive can a risk measure be? SIAM Journal onFinancial Mathematics , , 776–803.Wang, R., Peng, L. and Yang, J. (2013). Bounds for the sum of dependent risks and worst Value-at-Risk withmonotone marginal densities. Finance and Stochastics , (2), 395–417.Wang, R. and Zitikis, R. (2020). An axiomatic foundation for the Expected Shortfall. Management Science , forth-coming.Zhu, S. and Fukushima, M. (2009). Worst-case conditional value-at-risk with application to robust portfolio manage-ment.

Operations Research , (5), 1155–1168.Zymler, S., Kuhn, D. and Rustem, B. (2013). Worst-case value-at-risk of nonlinear portfolios. Management Science , (1), 172–188. echnical Appendices A Counter-examples

Example A.1 (Non-sharpness of the convolution bound in Theorem 2) . Without loss of generality, weconsider the case t = 0. Let µ be a bi-atomic uniform distribution on {− , } . It is easy to see thatsup ν ∈ Λ ( µ ) q +0 ( ν ) = − ν ∈ Λ ( µ ) is supported in {− , − , , } with mean 0. On the other hand,for ( β , β , β , β ) ∈ ∆ with β > β > β , by symmetry, and the fact that R − β,β − α is increasing in α andincreasing in β , we have R β ,β ( µ ) = − R − β − β ,β ( µ ) = − R β + β ,β ( µ ) > − R β ,β + β ( µ ) ,R β ,β ( µ ) > R β ,β + β ( µ ) , and R β ,β ( µ ) > R β ,β + β ( µ ) > R β , − β ( µ ) = 0 . Combining the above three inequalities, we have P i =1 R β i ,β ( µ ) > . Hence,sup ν ∈ Λ ( µ ) q +0 ( ν ) = − < inf β ∈ ∆ n X i =1 R β i ,β ( µ ) , showing that (14) is not an equality. Example A.2 (The reduced formula (12) does not hold for an increasing density) . Without loss of generality,we consider the case t = 0. Suppose that µ ∈ M has an increasing density on its support. Then, the cdfof µ is convex, and hence the left quantile q − u ( µ ) is a concave function of u ∈ (0 , q − u ( µ ), we have11 − nα Z − α ( n − α q − u ( µ )d u > − α Z − αα q − u ( µ )d u > Z q − u ( µ )d u. Therefore, inf α ∈ (0 , n ) nR α, − nα ( µ ) = nR , ( µ ) . Note that if (12) holds, then inf ν ∈ Λ n ( µ ) q − ( ν ) = nR , ( µ ), which, by Proposition A.3 below, implies that µ is n -CM. There are distributions µ with a decreasing density that are not n -CM, and an equivalent conditionis obtained by Wang and Wang (2011); see Appendix C for further explanation. Therefore, (12) does nothold for some distributions with a decreasing density. A speciﬁc example is shown in Figure 2 (right panel). Example A.3 ((16) does not hold without a ﬁnite mean) . By Theorem 4.2 of Puccetti et al. (2019), for24tandard Cauchy probability measures µ , . . . , µ n , there exists ν , ν ∈ Γ( µ ) such that q +0 ( ν ) = q − ( ν ) = sup β ∈ ∆ n n X i =1 R − β − β i ,β ( µ i ) = − n log( n − π and q +0 ( ν ) = q − ( ν ) = inf β ∈ ∆ n n X i =1 R β i ,β ( µ i ) = n log( n − π . Hence, we haveinf ν ∈ Λ( µ ) q − ( ν ) = sup β ∈ ∆ n n X i =1 R − β − β i ,β ( µ i ) = − n log( n − π< n log( n − π = inf β ∈ ∆ n n X i =1 R β i ,β ( µ i ) = sup ν ∈ Λ( µ ) q +0 ( ν ) . B Well-posedness

Similarly to (17), for µ = ( µ , . . . , µ n ) ∈ M n and β = ( β , β , . . . , β n ) ∈ ∆ n , we denote by R − β ( µ ) = n X i =1 R − β i − β ,β ( µ i ) . (A.1)We discuss the attainability of the inﬁmum in inf β ∈ ∆ n R + β ( µ ) and the supremum in sup β ∈ ∆ n R − β ( µ ). Notethat R + β ( µ ) and R − β ( µ ) are well deﬁned for β ∈ ∆ n . Now we discuss cases with β i taking boundary valuesof 0 ,

1. We discuss whether R + β ( µ ) and R − β ( µ ) are well deﬁned on the closure ∆ n .1. For β ∈ ∆ n ⊂ ∆ n , there is no undeﬁned form “ ∞ − ∞ ” in R + β ( µ ) and R − β ( µ ), which are hence alwayswell deﬁned.2. For β ∈ ∆ n with β i = 0 for some i ∈ { , . . . , n } and β ∈ (0 , R − β and R + β similarly: R + β ( µ ) = X j = i R β j ,β ( µ j )+ R ,β ( µ i ) , R − β ( µ ) = X j = i R − β j − β ,β ( µ j )+ R − β ,β ( µ i ) , except “ ∞ − ∞ ” cases that the integral of q − t ( µ i ) at the neighbour of 0 is negative inﬁnite and that of q − t ( µ j ) at the neighbour of 1 is inﬁnite for some i, j ∈ { , . . . , n } , i.e., R ,ε ( µ j ) = ∞ and R − ε,ε ( µ i ) = −∞ for some ε ∈ (0 , R + β and R − β are always well deﬁned if µ ∈ M n .3. For β ∈ ∆ n with β = 0, we deﬁne R + β ( µ ) = n X i =1 q − − β i ( µ i ) , R − β ( µ ) = n X i =1 q + β i ( µ i ) , except “ ∞ − ∞ ” cases that q − ( µ i ) = ∞ and q − ( µ j ) = −∞ for some i, j ∈ { , . . . , n } and i = j . Theyare always well deﬁned if µ , . . . , µ n are all bounded from positive or negative side.25ecause of the continuity of R β,α in β, α ∈ [0 , β ∈ (1 − t )∆ n R + β ( µ )of cases t ∈ [0 ,

1) and the supremum of sup β ∈ t ∆ n R − β ( µ ) of cases t ∈ (0 ,

1] are attained in the well-deﬁnedpart of ∆ n . C Connection to joint mixability

Joint mixability is closely related to quantile aggregation. The tuple of distributions µ ∈ M n is saidto be jointly mixable (JM, Wang et al. (2013)) if δ C ∈ Λ( µ ) for some C ∈ R . Such C is called a center of µ . Similarly, a probability measure µ on R is n - completely mixable ( n -CM, Wang and Wang (2011)) if the n -tuple ( µ, . . . , µ ) is JM. Obviously, if µ ∈ M n is JM, then its center is unique and equal to the sum of themeans of its components. If µ ∈ M n is JM but it is not in M n , then its center may not be unique (Puccettiet al. (2019)). The determination of joint mixability for a given µ ∈ M n is well known to be a challengingproblem and analytical results are limited. The main results of this appendix are a suﬃcient condition onsharpness of convolution bounds and some conditions on determination of JM.We ﬁrst see that JM is a suﬃcient condition for the bounds in Proposition 5 to be sharp for probabilitymeasures with ﬁnite means. Proposition A.1. If µ ∈ M n is JM, then the bounds in Proposition 5 are sharp, and their values are equalto the unique center of µ .Proof. Note that since µ = ( µ , . . . , µ n ) is JM, we know δ C ∈ Λ( µ ) where C = P ni =1 R , ( µ i ). Hence, byProposition 5, inf β ∈ ∆ n R + β ( µ ) > sup ν ∈ Λ( µ ) q +0 ( ν ) > q +0 ( δ C ) > C > inf β ∈ ∆ n R + β ( µ ) . The case for (15) is similar.Proposition A.1 supports Proposition 5 by giving further conditions for the bounds in Proposition 5 to besharp, which can be checked through existing results on joint mixability in Wang and Wang (2016). However,unlike Theorem 6, Proposition A.1 itself does not oﬀer new ways to calculate quantile aggregation, since theconvolution bounds in (14) and (15) are all trivially equal to the center if we know ( µ , . . . , µ n ) ∈ M n is JM.Next, we look in the converse direction: implications of Theorems 2-6 on conditions for JM. Proposition5 directly implies the following necessary condition for JM, which is also noted by Proposition 3.3 of Puccettiet al. (2019) with a similar argument. If µ is JM with center C , then C = q +0 ( ν ) = q − ( ν ) for some ν ∈ Λ( µ ). Hence, inf ν ∈ Λ( µ ) q − ( ν ) C sup ν ∈ Λ( µ ) q +0 ( ν ) . Using Proposition 5, we arrive at (where R − β ( µ ) is deﬁned at (A.1))sup β ∈ ∆ n R − β ( µ ) C inf β ∈ ∆ n R + β ( µ ) .

26f the means of µ , . . . , µ n are ﬁnite, then by Proposition 6, we have sup β ∈ ∆ n R − β ( µ ) > inf β ∈ ∆ n R + β ( µ ) . Therefore, a necessary condition for µ ∈ M n to be JM issup β ∈ ∆ n R − β ( µ ) = inf β ∈ ∆ n R + β ( µ ) . We summarize the above simple ﬁndings in the following proposition. We use the convention that the closedinterval [ a, b ] is empty if a > b . Proposition A.2.

The possible center C of µ = ( µ , . . . , µ n ) ∈ M n satisﬁes C ∈ " sup β ∈ ∆ n R − β ( µ ) , inf β ∈ ∆ n R + β ( µ ) . (A.2) In particular, if µ is JM, then sup β ∈ ∆ n R − β ( µ ) inf β ∈ ∆ n R + β ( µ ) , (A.3) and further if µ ∈ M n , then sup β ∈ ∆ n R − β ( µ ) = n X i =1 R , ( µ i ) = inf β ∈ ∆ n R + β ( µ ) . (A.4)The set ∆ n appeared in Proposition A.2 may be replaced by ∆ n if ( µ , . . . , µ n ) ∈ M n . We next verifythat for many classes distributions known in the literature, (A.3)-(A.4) actually are suﬃcient for JM, andall centers are identiﬁed with Proposition A.2. We ﬁrst present a convenient result which is useful for thedetermination of JM for distributions with ﬁnite means. Proposition A.3.

For µ = ( µ , . . . , µ n ) ∈ M n , the following statements are equivalent.(i) µ is JM.(ii) sup ν ∈ Λ( µ ) q +0 ( ν ) = P ni =1 R , ( µ i ) . (iii) inf ν ∈ Λ( µ ) q − ( ν ) = P ni =1 R , ( µ i ) . (iv) sup ν ∈ Λ( µ ) q +0 ( ν ) = inf ν ∈ Λ( µ ) q − ( ν ) . Proof.

Let C = P ni =1 R , ( µ i ). By Proposition 6,sup ν ∈ Λ( µ ) q +0 ( ν ) C inf ν ∈ Λ( µ ) q − ( ν ) . (A.5)As a consequence, (iv) ⇒ (ii)-(iii).If µ is JM, then there exists ν ∈ Λ( µ ) such that q +0 ( ν ) = P ni =1 R , ( µ i ) = q − ( ν ). This, together with(A.5), shows the implication (i) ⇒ (ii)-(iv). 27f sup ν ∈ Λ( µ ) q +0 ( ν ) = C, then, noting that R , ( ν ) = C for all ν ∈ Λ( µ ), we have δ C ∈ Λ( µ ), since Λ( µ )is closed under weak convergence (Theorem 2.1 of Bernard et al. (2014)). This shows (ii) ⇒ (i). Similarly,(iii) ⇒ (i).Next, in view of Theorem 2, we show in Proposition A.4 that it can be checked through convolutionbounds whether some distributions are JM if they have monotone densities. Proposition A.4. µ = ( µ , . . . , µ n ) ∈ M n is JM if and only if (A.4) holds, in the following cases:(i) Each of µ , . . . , µ n admits a decreasing density on its support.(ii) Each of µ , . . . , µ n admits an increasing density on its support.(iii) µ , . . . , µ n are from the same location-scale family with unimodal and symmetric densities on theirsupports.Proof. The necessity of (A.4) is stated in Proposition A.2, and hence we only show its suﬃciency.(i) By Theorem 2 and (A.4), we knowsup ν ∈ Λ( µ ) q +0 ( ν ) = inf β ∈ ∆ n R + β ( µ ) = n X i =1 R , ( µ i ) . By Proposition A.3 (ii) ⇒ (i), µ is JM.(ii) This is symmetric to (i).(iii) Without loss of generality, we may assume that µ , . . . , µ n all have mean zero and they have scaleparameters a > . . . > a n >

0, respectively. By Corollary 3.6 of Wang and Wang (2016), we know that( µ , . . . , µ n ) is JM if and only if 2 W ni =1 a i P ni =1 a i . Take β = (1 − ε, ε, , , . . . , ∈ ∆ n for some ε ∈ (0 , µ , . . . , µ n are from the same location-scale family with symmetric densities, we have − R ε, − ε ( µ i ) /a i = − R , − ε ( µ i ) /a i = R , − ε ( µ ) /a > i = 1 , . . . , n . By (A.4), we have0 = n X i =1 R , ( µ i ) > n X i =1 R − β − β i ,β ( µ i ) = R , − ε ( µ ) + n X i =2 R ε, − ε ( µ i )= R , − ε ( µ ) a a − n X i =2 a i ! . Therefore, a − P ni =2 a i

0, which implies 2 W ni =1 a i P ni =1 a i . Remark . By Theorem 3.2 of Wang and Wang (2016), for µ , . . . , µ n ∈ M with decreasing densities,( µ , . . . , µ n ) is JM if and only if n _ i =1 (cid:0) q − ( µ i ) − q +0 ( µ i ) (cid:1) n X i =1 (cid:0) R , ( µ i ) − q +0 ( µ i ) (cid:1) . (A.6)28e already know that (A.4) is necessary for ( µ , . . . , µ n ) to be JM. One can directly check that (A.6) isimplied by (A.4), thus showing the equivalence of (A.4) and (A.6). Remark . A similar situation of Proposition A.4 is obtained for distributions without the mean: if each of µ , . . . , µ n is a standard Cauchy distribution, the set of all centers of ( µ , . . . , µ n ) is precisely given by (A.2).This statement is based on the Example 4.1 and Theorem 4.2 of Puccetti et al. (2019). D Proofs of main results

D.1 Proofs in Section 3

We ﬁrst present a lemma slightly generalizing the RVaR inequalities in Theorem 1 of Embrechts et al.(2018).

Lemma A.1.

Let α , . . . , α n , β , . . . , β n ∈ [0 , . Denote by b = P ni =1 β i and a = W ni =1 α i . If b + a , thenfor all µ = ( µ , . . . , µ n ) ∈ M n and ν ∈ Λ( µ ) , R b,a ( ν ) n X i =1 R β i ,α i ( µ i ) , (A.7) provided the right-hand side of (A.7) is well-deﬁned (no “ ∞ − ∞ ”).Proof of Lemma A.1. Theorem 1 of Embrechts et al. (2018) with the notation RVaR β,α ( µ ) = R β,α ( µ ) for α, β > α + β µ , . . . , µ n ∈ M . For µ , . . . , µ n that do not necessarily have ﬁnite means,we always assume that the right-hand side of (A.7) is well-deﬁned (no “ ∞ − ∞ ”).If there exists some i such that R β i ,α i ( µ i ) = ∞ , (A.7) holds trivially. Now we assume R β i ,α i ( µ i ) < ∞ , i = 1 , . . . , n . There are four cases:1. Suppose b + a < b >

0. In this case, R b,a and R β i ,α i are continuous with respect to weakconvergence on M (see e.g. Cont et al. (2010)). For µ ∈ Γ( µ , . . . , µ n ) such that ν = λ µ , we can ﬁnd asequence µ ( k ) , k ∈ N such that all one-dimensional margins of µ ( k ) are in M , and µ ( k ) → µ weakly as k → ∞ . As a consequence, all one-dimensional margins of µ ( k ) , as well as its projection λ µ k , convergeweakly. Since (A.7) holds for probability measures in M , using the continuity of R b,a and R β i ,α i , weknow (A.7) holds in this case.2. Suppose b + a = 1 and b >

0. If R − a,a ( ν ) = −∞ , (A.7) holds trivially. If R − a,a ( ν ) > −∞ , thenlim ε ↓ R − a,a − ε ( ν ) = R − a,a ( ν )since R − a,a − ε ( ν ) is monotone for ε ∈ (0 , a ). In the ﬁrst case, we have shown, for ε ∈ (0 , V ni =1 α i ), R − a,a − ε ( ν ) n X i =1 R β i ,α i − ε ( µ i ) . ε ↓ b + a < b = 0. It implies that β = · · · = β n = 0. Because R ,α i ( µ i ) < ∞ , i = 1 , . . . , n ,we have lim ε ↓ R ε,α i ( µ i ) = R ,α i ( µ i ) , since R ε,α i ( µ i ) is monotone for ε ∈ (0 , − α i ), i = 1 , . . . , n . In the ﬁrst case, we have shown, for ε ∈ (0 , − a ), R nε,a ( ν ) n X i =1 R ε,α i ( µ i ) . Taking a limit as ε ↓ b + a = 1 and b = 0. It implies that β = · · · = β n = 0. Because R ,α i ( µ i ) < ∞ , i = 1 , . . . , n ,we know P ni =1 R , ( µ i ) is well deﬁned. By the linearity of R , , we have R , ( ν ) = n X i =1 R , ( µ i ) n X i =1 R ,α i ( µ i ) , which establishes (A.7). Proof of Theorem 1.

The inequality (9) is shown in the text above Theorem 1. We proceed to prove thesharpness under the following cases.(i) If t = 0, then R t,s = ES s and { β ∈ (1 − t )∆ n : β > s } = { ( s, , . . . , } . It is well known (e.g., Kusuoka(2001)) that ES s is subadditive and comonotonic addivitive, which givessup ν ∈ Λ( µ ) R ,s ( ν ) = sup ν ∈ Λ( µ ) ES s ( ν ) = n X i =1 ES s ( µ i ) = inf β ∈ (0+ s )∆ n β > s n X i =1 R β i ,β ( µ i ) . (ii) If n = 1, sharpness of (9) holds trivially. If n = 2, we take ν ∈ Λ( µ ) obtained from random variables X ∼ µ and Y ∼ µ where X and Y share the same (1 − t − s )-tail event A (i.e., an event of probability t + s on which both X and Y take their largest values; see Wang and Zitikis (2020)) and X and Y arecounter-monotonic conditional on A . For this speciﬁc construction,sup ν ∈ Λ( µ ) R t,s ( ν ) > s Z − t − t − s q − − t − s − u ( µ ) + q − u ( µ )d u = 1 s Z − s q − u ( µ )d u + 1 s Z − t − t − s q − u ( µ )d u > inf β ∈ (0 ,t ) ( s Z − β − s − β q − u ( µ )d u + 1 s Z − t + β − t − s + β q − u ( µ )d u ) = inf β ∈ (0 ,t ) { R β ,s ( µ ) + R t − β ,s ( µ ) } > inf β ∈ ( t + s )∆ n β > s n X i =1 R β i ,β ( µ i ) . Step 1 : Using Proposition 1 (which will be shown later), we havesup ν ∈ Λ( µ ) R t,s ( ν ) = sup ν ∈ Λ( µ (1 − t − s )+ ) LES st + s ( ν ) . (A.8)Hence, it suﬃces to consider the problem of the right-hand side of (A.8). Step 2 : Since each of µ , . . . , µ n admits a decreasing density beyond its (1 − t − s )-quantile, eachof µ (1 − t − s )+1 , . . . , µ (1 − t − s )+ n admits a decreasing density on its support. We can deﬁne an aggregaterandom variable T s n by (see Equation (3.4) of Jakobsons et al. (2016)) T s n = h ( U ) { U ∈ (0 ,s n ) } + d ( s n ) { U ∈ [ s n , } , which will be explained below.(a) We can write T s n = P ni =1 X i where X i ∼ µ (1 − t − s )+ i , i = 1 , . . . , n . Let ν be the distributionmeasure of T s n . Lemma 3.4 (c) of Jakobsons et al. (2016) gives ν ∈ Λ( µ t + ).(b) U is a uniform random variable on [0 , h, d : [0 , → R are functions and s n ∈ [0 ,

1] is a constant.They are given by: h ( x ) = n X i =1 y i ( x ) − ( n − y ( x ) , x ∈ (0 , ,d ( x ) = 11 − x n X i =1 E (cid:2) X i { y i ( x ) − y ( x ) X i y i ( x ) } (cid:3) , x ∈ (0 , ,s n = inf { x ∈ (0 ,

1) : h ( x ) d ( x ) } , where y, y , . . . , y n are functions on (0 ,

1) satisfying (see Equations (E1)-(E2) of Jakobsons et al.(2016)) (E1) : n X i =1 P ( X i > y i ( x )) = x, (E2) : P ( y i ( x ) − y ( x ) < X i y i ( x )) = 1 − x, i = 1 , . . . , n. (c) According to Lemma 3.2 of Jakobsons et al. (2016), h is a decreasing function on (0 , s n ). Hence,for all u ∈ (0 , s n ), we have h ( u ) > d ( s n ), and further d ( s n ) = q +0 ( ν ). Step 3 : Denote by a = min { tt + s , s n } . We proceed to showLES st + s ( ν ) = d ( a ) . (A.9)31e verify this by direct computation. If t/ ( t + s ) > s n , thenLES st + s ( ν ) = 1 st + s E h T s n { U ∈ [ tt + s , } i = d ( s n );if t/ ( t + s ) < s n , thenLES st + s ( ν ) = 1 st + s E h T s n { U ∈ [ tt + s , } i = 1 st + s (cid:16) E [ T s n ] − E h h ( U ) { U ∈ (0 , tt + s ) } i(cid:17) = 1 st + s n X i =1 E [ X i ] − n X i =1 E h X i ( { X i >y i ( tt + s ) } + { X i s , which is deﬁned by β i = ( t + s ) µ (1 − t − s )+ i ( y i ( a ) , ∞ ) = µ i ( y i ( a ) , ∞ ) , i = 1 , . . . , n,β = ( t + s )(1 − a ) . According to (E1), P ni =1 β i = ( t + s ) a . We have ( β , β , . . . , β n ) ∈ ( t + s )∆ n , and β > s . Hence, d ( a ) = n X i =1 − a E (cid:2) X i { y i ( a ) − y ( a ) X i y i ( a ) } (cid:3) = n X i =1 − a Z y i ( a ) y i ( a ) − y ( a ) xµ (1 − t − s )+ i (d x )= n X i =1 − a Z − βit + s a − βit + s q − u ( µ (1 − t − s )+ i )d u = n X i =1 − a Z − βit + s a − βit + s q − − t − s +( t + s ) u ( µ i )d u = n X i =1 − a )( t + s ) Z − β i − β i − β q − v ( µ i )d v = n X i =1 β Z − β i − β i − β q − v ( µ i )d v = n X i =1 R β i ,β ( µ i ) , where the third equality is due to the fact that µ (1 − t − s )+ i ( −∞ , y i ( a ) − y ( a )] = a − β i t + s derived from(E2). 32 tep 5 : Combining (A.8), (A.9) and (A.10), we havesup ν ∈ Λ( µ ) R t,s ( ν ) = sup ν ∈ Λ( µ (1 − t − s )+ ) LES st + s ( ν ) > LES st + s ( ν )= d ( a ) = n X i =1 R β i ,β ( µ i ) > inf β ′ ∈ ( t + s )∆ n β ′ > s n X i =1 R β ′ i ,β ′ ( µ i ) . Thus, the bound (9) is sharp.(iv) It suﬃces to prove the statement for t = 1 − s . The assumption P ni =1 µ i ( q +0 ( µ i ) , q − ( µ i )] X , . . . , X n ) where X i ∼ µ i , i = 1 , . . . , n . Hence, the desired result follows from Lemma A.2 below by checking that thebound (9) is attained by such a vector. Deﬁnition A.1 (Lower mutually exclusivity) . We say that a random vector ( X , . . . , X n ) where X i ∼ µ i , i = 1 , . . . , n is lower mutually exclusive if P ( X i > q +0 ( µ i ) , X j > q +0 ( µ j )) = 0 for all i = j . Lemma A.2.

If random variables X , . . . , X n are mutually exclusive and bounded from below, then for α ∈ (0 , , R − α,α n X i =1 X i ! = n X i =1 R β i ,α ( X i ) , (A.11) for some β , . . . , β n ∈ [0 , with P ni =1 β i = 1 − α .Proof. Without loss of generality, we assume q +0 ( µ i ) = 0 for each i . If q + α ( P ni =1 X i ) = 0, then P ni =1 P ( X i >

0) = P ( P ni =1 X i > − α . Hence, we can choose β i > P ( X i >

0) for each i , and both sides of (A.11) are0. Below we assume q + α ( P ni =1 X i ) > µ i of X i is continuous on { X i > } for each i = 1 , . . . , n , and sois the conditional distribution of P ni =1 X i on { P ni =1 X i > } .Let y = q + α ( P ni =1 X i ) and A = { P ni =1 X i y } . We have P ( A ) = α . For each i = 1 , . . . , n , let α i = P ( A ∩ { X i > } ) = P (0 < X i y ) and t i = P ( X i > R − α,α n X i =1 X i ! = E " n X i =1 X i | A = n X i =1 E [ X i | A ]= 1 α n X i =1 (cid:0) E (cid:2) X i A ∩{ X i > } (cid:3)(cid:1) = 1 α n X i =1 (cid:18)Z − t i − t i + α i − α q + u ( µ i )d u + Z − t i + α i − t i q + u ( µ i )d u (cid:19) = n X i =1 R t i − α i ,α ( X i ) .

33e can check, by lower mutual excusivity and the continuity assumption, that n X i =1 ( t i − α i ) = n X i =1 ( P ( X i > − P (0 < X i y )) = n X i =1 P ( X i > y ) = P n X i =1 X i > y ! = 1 − α. By (9), we have R − α,α n X i =1 X i ! n X i =1 R t i − α i ,α ( X i ) . Therefore, (A.11) holds by choosing β i = t i − α i , i = 1 , . . . , n . In case that the conditional distributions of X , . . . , X n are positive and not continuous, we can approximate (by convergence in distribution) X , . . . , X n by conditionally continuous distributions while ﬁxing P ( X i >

0) for each i . The compactness of the set(1 − α )∆ n − on which ( β , . . . , β n ) takes values and the continuity of R β,α with respect to weak convergence(e.g., Cont et al. (2010)) yields the desirable result. Proof of Proposition 1.

The ﬁrst equality is a direct consequence of Theorem 4.1 and Example 6.3 of Liuand Wang (2020), and the second equality follows from Remark 4.1 of Liu and Wang (2020).

D.2 Proofs in Section 4

Proof of Theorem 2.

The convolution bound (11) is obtained by taking a limit of (9) in Theorem 2 using(4). Similarly, based on Theorem 2 and the fact that R β i ,β is continuous in β , this limit argument alsogives sharpness in (i), (ii) and (iv). Next we proceed to show sharpness in (iii) and (v).(iii) First, we note that sup ν ∈ Λ( µ ) q + t ( ν ) = − inf ˜ ν ∈ Λ(˜ µ ) q − − t (˜ ν ) , (A.12)where ˜ µ i is the distribution measure of the random variable − X i with X i ∼ µ i , i = 1 , . . . , n and˜ µ = (˜ µ , . . . , ˜ µ n ). The fact that each of µ , . . . , µ n admits an increasing density beyond its t -quantileimplies that each of ˜ µ , . . . , ˜ µ n admits an decreasing density below its (1 − t )-quantile. Note that adistribution that has a decreasing density below its (1 − t )-quantile is supported in either a ﬁnite interval[ a, b ] or an half real line [ a, ∞ ) for some a, b ∈ R . Hence, without loss of generality, we can assume q +0 (˜ µ i ) = 0, i = 1 , . . . , n .For sharpness of (11), we need to showsup ν ∈ Λ( µ ) q + t ( ν ) > inf β ∈ (1 − t )∆ n n X i =1 R β i ,β ( µ i ) . By (A.12) and the deﬁnition of R β,α , it suﬃces to showinf ˜ ν ∈ Λ(˜ µ ) q − − t (˜ ν ) sup β ∈ (1 − t )∆ n n X i =1 R − β i − β ,β (˜ µ i ) . (A.13)34ix j ∈ { , . . . , n } and β j ∈ (0 , − t ). By taking β i = 0 for i ∈ { , . . . , n }\{ j } and β = 1 − t − β j , weget sup β ∈ (1 − t )∆ n n X i =1 R − β i − β ,β (˜ µ i ) > R t, − t − β j (˜ µ j ) + X i = j R t + β j , − t − β j (˜ µ i ) > R t, − t − β j (˜ µ j ) . Taking a supremum over β j ∈ (0 , − t ) and j ∈ { , . . . , n } yieldssup β ∈ (1 − t )∆ n n X i =1 R − β i − β ,β (˜ µ i ) > n _ j =1 sup β j ∈ (0 , − t ) R t, − t − β j (˜ µ j ) = n _ j =1 q − − t (˜ µ j ) . (A.14)If W nj =1 q − − t (˜ µ j ) = ∞ , then the right-hand side of (A.13) is ∞ , which holds automatically. If W nj =1 q − − t (˜ µ j ) < ∞ , we can apply Corollary 4.7 of Jakobsons et al. (2016), using the condition that each of µ , . . . , µ n admits a decreasing density below its (1 − t )-quantile. This givesinf ˜ ν ∈ Λ(˜ µ ) q +1 − t (˜ ν ) = max ( n _ i =1 q − − t (˜ µ i ) , n X i =1 R t, − t (˜ µ i ) ) . (A.15)Also note that in this case, R t, − t (˜ µ i ) < ∞ , i = 1 , . . . , n , and hencesup β ∈ (1 − t )∆ n n X i =1 R − β i − β ,β (˜ µ i ) > n X i =1 R t, − t (˜ µ i ) . (A.16)Combining (A.14)-(A.16), we get (A.13).(v) It suﬃces to prove the case t = 0. The assumption P ni =1 µ i [ q +0 ( µ i ) , q − ( µ i )) X , . . . , X n ) where X i ∼ µ i , i = 1 , . . . , n . Hence, we have q +0 n X i =1 X i ! = min i n  q +0 ( µ i ) + X j = i q − ( µ j )  > inf β ∈ ∆ n n X i =1 R β i ,β ( µ ) . Hence, the desired result follows as the bound (11) is attained by such a vector.

Deﬁnition A.2 (Upper mutually exclusivity) . We say that a random vector ( X , . . . , X n ) where X i ∼ µ i , i = 1 , . . . , n is upper mutually exclusive if P ( X i < q − ( µ i ) , X j < q − ( µ j )) = 0 for all i = j . Proof of Proposition 2.

Letting β = · · · = β n = α in Theorems 2 and 6, we immediately get (12). We show(12) holds as an equality in this case of decreasing density. Note that the second equality in (12) is simply thedeﬁnition. By Proposition 1 of Embrechts et al. (2014), sup ν ∈ Λ n ( µ ) q + t ( ν ) is equal to n times the conditionalmean of µ on an interval [ t + ( n − α, − α ] for some α ∈ [0 , − tn ]. Therefore,sup ν ∈ Λ n ( µ ) q + t ( ν ) > inf α ∈ (0 , − tn ) nR α, − t − nα ( µ ) . ” sign in (12) is implied by Proposition 2. Hence, (12) holds as an equality in thiscase. Proof of Proposition 3.

The statement for q + t , t ∈ (0 , t = 0 follows from the same argument by noting the upper semicontinuity of q +0 . Proof of Proposition 4.

To show the “ > ” direction of (13), we note that for any ν Y ∈ Λ( µ [ m ] ) such that Y ∼ µ [ m ]1 , . . . , Y n ∼ µ [ m ] n and P ni =1 Y i ∼ ν Y , by letting X i = q − U Yi ( µ i ), i = 1 , . . . , n , we get X i ∼ µ i and X i > Y i for each i = 1 , . . . , n . Denote by ν X the distribution measure of P ni =1 X i . It follows that ν X ∈ Λ( µ )and q + t ( ν X ) > q + t ( ν Y ) , which gives sup ν ∈ Λ( µ ) q + t ( ν ) > sup ν ∈ Λ( µ [ m ] ) q + t ( ν ) . To show the “ ” direction of (13), for any ν X ∈ Λ( µ ) such that random variables X ∼ µ , . . . , X n ∼ µ n and P ni =1 X i ∼ ν X , let Y i = X i ∧ m , i = 1 , . . . , n . Write S X = P ni =1 X i and S Y = P i =1 Y i . Denote by ν Y the distribution measure of S Y . We have ν Y ∈ Λ( µ [ m ] ). By Corollary 1 of Embrechts et al. (2018), we have,for ε > q − t + ε ( ν X ) n X i =1 q − − (1 − t − ε ) /n ( µ i ) . Taking a limit of the above equation as ε ↓

0, we obtain q + t ( ν X ) n X i =1 q +1 − (1 − t ) /n ( µ i ) m. (A.17)It is clear that ( S X ∧ m ) S Y because the real function x x ∧ m is subadditive. Denote by ˜ ν thedistribution of S X ∧ m . Hence, (A.17) implies q + t ( ν X ) = q + t (˜ ν ) q + t ( ν Y ) . (A.18)Taking a supremum of (A.18) over all possible choices of ν X ∈ Λ( µ ), we getsup ν ∈ Λ( µ ) q + t ( ν ) sup ν ∈ Λ( µ [ m ] ) q + t ( ν ) . Proof of Proposition 5.

It is a direct corollary by letting t ↓ t ↑ roof of Proposition 6. Note thatsup β ∈ ∆ n n X i =1 R − β i − β ,β ( µ i ) > lim ε ↓ n X i =1 R ( n − ε, − nε ( µ i )= n X i =1 R , ( µ i ) = lim ε ↓ n X i =1 R ε, − nε ( µ i ) > inf β ∈ ∆ n n X i =1 R β i ,β ( µ i ) . Hence, the second and the third inequalities in (16) hold. The ﬁrst and the last inequalities are due toProposition 5.

D.3 Proofs in Section 5

Proof of Theorem 3.

By assumption and Proposition 3, there exists ν + ∈ Λ( µ ) such that q +0 ( ν + ) = sup ν ∈ Λ( µ ) q +0 ( ν ) = R + β ( µ ) . Take X ∗ i ∼ µ i , i = 1 , . . . , n such that P ni =1 X ∗ i ∼ ν + and P ni =1 X ∗ i > q +0 ( ν + ) almost surely. We dividethe proof into several steps. We ﬁrst prove the properties of X ∗ i in (18) in Steps 1-3 and the feasibility of(19) and its optimality given the above suﬃcient condition in Steps 4-5. In Steps 1-3, we will show that theprobability space Ω is divided into Ω = A ∪ · · · ∪ A n ∪ A c where A i is deﬁned by A i = { X ∗ i > q − − β i ( µ i ) } (“right-tail” parts of X , . . . , X n ) and A is deﬁned by A = S ni =1 A i , with the following properties:(a) on the set A c , X ∗ i ∼ µ [1 − β − β i , − β i ] i for all i = 1 , . . . , n and P ni =1 X ∗ i = q +0 ( ν + ) almost surely;(b) for any ﬁxed i = 1 , . . . , n , on the set A i , X ∗ i ∼ µ (1 − β i , i and X ∗ j ∼ µ [0 , − β − β j ) j for all j = i . Step 1 : We show that the set (cid:8)P ni =1 X ∗ i = q +0 ( ν + ) (cid:9) has probability no less than β . By (7), we have q +0 ( ν + ) R − β ,β ( ν + ) R + β ( µ ) = q +0 ( ν + ) , (A.19)and hence all inequalities in (A.19) are equalities. The fact that q +0 ( ν + ) = R − β ,β ( ν + ) implies that q − t ( ν + ) = q +0 ( ν + ) for all t ∈ (0 , β ] and the set (cid:8)P ni =1 X ∗ i = q +0 ( ν + ) (cid:9) has probability no less than β . Step 2 : We proceed to show that the events (“body” parts of X , . . . , X n ) { q − − β − β i ( µ i ) X ∗ i q − − β i ( µ i ) } , i = 1 , . . . , n, (A.20)are identical, which is denoted by a set B , and then P ni =1 X ∗ i = q +0 ( ν + ) almost surely on B .As the events A i = { X ∗ i > q − − β i ( µ i ) } , i = 1 , . . . , n , and A = ∪ ni =1 A i , we have P ( A ) P ( A ) + · · · + P ( A n ) = P ni =1 β i = 1 − β . Denote by κ i ∈ M the distribution measure of T i = X ∗ i A ci + m A i , i = 1 , . . . , n, where m is a real number satisfying that m < min i n q − − β − β i ( µ i ). Denote by τ the distribution measureof the sum variable P ni =1 T i . It is veriﬁed that κ i has a ﬁnite mean and R β i ,β ( µ i ) = ES β ( κ i ), i = 1 , . . . , n .37e ﬁrst prove that q − t ( τ ) > q − t − β ( ν + ) , t ∈ (1 − β , . (A.21)Fix t ∈ (1 − β , P ni =1 T i A c = P ni =1 X ∗ i A c and for any x ∈ R , τ ( x, ∞ ) = P n X i =1 T i > x ! > P n X i =1 X ∗ i > x, A c ! > P n X i =1 X ∗ i > x ! − P ( A ) > P n X i =1 X ∗ i > x ! − β = ν + ( x, ∞ ) − β . For any x < q − t − β ( ν + ), we have ν + ( −∞ , x ] < t − β , and τ ( −∞ , x ] ν + ( −∞ , x ] + 1 − β < t and then x < q − t ( τ ). Hence we have q − t ( τ ) > q − t − β ( ν + ) and prove (A.21). Thus, it follows from sharpness of (A.19)that n X i =1 R β i ,β ( µ i ) = n X i =1 ES β ( κ i ) > ES β ( τ )= 1 β Z − β q − t ( τ )d t > β Z − β q − t − β ( ν + )d t = 1 β Z β q − t ( ν + )d t = R − β ,β ( ν + ) = n X i =1 R β i ,β ( µ i ) , (A.22)where the ﬁrst inequality is the well-known subaddivity of ES β . Thus, all inequalities in (A.22) are sharp.The fact that the ﬁrst inequality in (A.22) is sharp implies that T i , i = 1 , . . . , n share the same tail eventwith probability β according to Theorem 5 in Wang and Zitikis (2020), i.e., the sets { T i > q − − β ( κ i ) } = { q − − β − β i ( µ i ) X ∗ i q − − β i ( µ i ) } , i = 1 , . . . , n, (also in (A.20)) are identical and have probability β . Wedenote this set by B . Furthermore, we have that this set B is disjoint with any A i , i = 1 , . . . , n .We write Y i = X ∗ i | B . Hence Y i ∼ µ [1 − β − β i , − β i ] and P ni =1 Y i = P ni =1 X ∗ i on the set B and q +0 ( P ni =1 Y i ) = E [ P ni =1 Y i ] = R + β ( µ ). Thus, P ni =1 Y i = R + β ( µ ) almost surely. Step 3:

We proceed to show that the events A i , i = 1 , . . . , n , are mutually disjoint. We can calculate ∂∂β ′ i R + β ′ ( µ ) = 1 β ′  R + β ′ ( µ ) − q − − β ′ i ( µ i ) − X j = i q − − β ′ − β ′ j ( µ j )  , β ′ ∈ ∆ n . The ﬁrst-order condition from the optimality of β reads as  R + β ( µ ) − q − − β i ( µ i ) − X j = i q − − β − β j ( µ j ) = 0 , if β > i ∈ { , . . . , n } satisfying β i = 0; R + β ( µ ) − q − ( µ i ) − X j = i q − − β − β j ( µ j ) > , if β > i ∈ { , . . . , n } satisfying β i = 0; R + β ( µ ) − n X j =1 q − − β j ( µ j ) = 0 , if β = 0. (A.23)38enote the sets (the “left-tail” parts of X , . . . , X n ) by C i = { X i < q − − β − β i ( µ i ) } , i = 1 , . . . , n. We have apartition Ω = A i ∪ B ∪ C i and P ( C i ) = 1 − β − β i , i = 1 , . . . , n . (A.23) shows that P (cid:0) ∩ nj =1 C j (cid:1) = 0 becausefor any ω ∈ ∩ nj =1 C j , for any ﬁxed i ∈ { , . . . , n } , n X j =1 X ∗ j ( ω ) < q − − β i ( µ j ) + X j = i q − − β − β j ( µ j ) R + β ( µ ) = q +0  n X j =1 X ∗ j  . Arguing by contradiction that there exists 1 k < l n such that P ( A k ∩ A l ) >

0. For any ﬁxed i ∈ { , . . . , n }\{ k, l } , we have P ( C i ) = P (cid:0) ∩ nj =1 C j (cid:1) + P ( ∪ j = i ( C i ∩ A j ))= P (cid:0) ∩ nj =1 C j (cid:1) + P ( ∪ j = i,k,l ( C i ∩ A j ) ∪ ( C i ∩ A k ∩ A cl ) ∪ ( C i ∩ A ck ∩ A l ) ∪ ( C i ∩ A k ∩ A l )) P (cid:0) ∩ nj =1 C j (cid:1) + X j = i,k,l P ( A j ) + P ( A k ∩ A cl ) + P ( A k ∩ A cl ) + P ( A k ∩ A l )= P (cid:0) ∩ nj =1 C j (cid:1) + X j = i,k,l P ( A j ) + P ( A k ) + P ( A l ) − P ( A k ∩ A l )= P (cid:0) ∩ nj =1 C j (cid:1) + 1 − β − β i − P ( A k ∩ A l )= P (cid:0) ∩ nj =1 C j (cid:1) + P ( C i ) − P ( A k ∩ A l ) . Hence P ( ∩ ni =1 C i ) > P ( A k ∩ A l ) >

0, which leads to a contradiction. Thus, A , . . . , A n are mutually disjointand P ( A ) = P ( ∪ ni =1 A i ) = P ni =1 P ( A i ) = 1 − β . As the set B is disjoint with A and P ( B ) = β , we know B = A c and have a partition Ω = A ∪ · · · ∪ A n ∪ A c . Now we prove the ﬁrst statement in the theorem onthe properties of X ∗ i in (18). Step 4 : If β = 1, we have that µ is jointly mixable. If β <

1, we check that the corresponding X ∗ i givenby (19) has distribution µ i , i = 1 , . . . , n . For each i = 1 , . . . , n , if x < q − − β − β i ( µ i ), we have P ( X ∗ i x ) = P ( U < − β , K = i, q − − β − βi − β U ( µ i ) x )= P ( K = i ) P ( q − − β − βi − β U ( µ i ) x )= 1 − β − β i − β − β − β − β i µ i ( −∞ , x ] = µ i ( −∞ , x ] . One can similarly check that P ( X ∗ i > x ) = µ i ( x, ∞ ) if x > q − − β i ( µ i ) and P ( X ∗ i x ) = µ i ( −∞ , x ] if q − − β − β i ( µ i ) x µ − − β i ( µ i ). Hence X ∗ i ∼ µ i , i = 1 , . . . , n . Step 5 : We ﬁnally show that if β < β , . . . , β n > h , . . . , h n is attained at x = 1 − β , then ( X ∗ , . . . , X ∗ n ) in (19) attains the maximum of q +0 for µ . According to theﬁrst-order condition (A.23), we have h (1 − β ) = · · · = h n (1 − β ) = R + β ( µ ) . For all i = 1 , . . . , n , we have h i ( x ) > R + β ( µ ) for all x ∈ (0 , − β ], i.e., P ni =1 X ∗ i > R + β ( µ ) almost surely on { U ∈ [0 , − β ) } . Since P ni =1 X ∗ i = P ni =1 Y i = R + β ( µ ) on { U ∈ [1 − β , } , we have q +0 ( P ni =1 X ∗ i ) = max ν ∈ Λ( µ ) q +0 ( ν ) = R + β ( µ ) . .4 Proofs in Section 6 Proof of Proposition 7.

Theorem 4.17 of R¨uschendorf (2013) givesinf ν ∈ Λ( µ ) ν ( −∞ , s ] > − D n ( s ) . (A.24)Standard argument inverting (A.24) gives (26). Proof of Theorem 4.

1. For ﬁxed t ∈ [0 , x = R + β ( µ ) the right-hand side (11). We proceed to show D − n (1 − t ) x and thus the dual bound (26) is not greater than the convolution bound. Case 1 : if the inﬁmum in (11) is attained at β = ( β , β , . . . , β n ) ∈ (1 − t )∆ n with β , . . . , β n >

0, theﬁrst-order condition reads as the ﬁrst equation in (A.23). Because β > − β − β i < − β i ,we have q − − β − β i ( µ i ) < q − − β i ( µ i ). Deﬁne r i = q − − β − β i ( µ i ) for i = 1 , . . . , n . One can check from theﬁrst-order condition that r i = q − − β i ( µ i ) + r − x , and hence r < x and r ∈ ∆ n ( x ). We have x = 1 β n X j =1 Z − β j − β − β j q − u ( µ j )d u = 1 β n X j =1 Z q − − βj ( µ j ) q − − β − βj ( µ j ) yµ j (d y )= 1 β n X j =1 Z x − r + r j r j yµ j (d y )= 1 β n X j =1 ( x − r + r j )(1 − β j ) − r j (1 − β − β j ) − Z x − r + r j r j µ j ( −∞ , y ]d y ! = 1 β n X j =1 ( x − r )(1 − β j ) + r j β − ( x − r ) + Z x − r + r j r j µ j ( y, ∞ )d y ! = 1 β x β − (1 − t )( x − r ) + n X j =1 Z x − r + r j r j µ j ( y, ∞ )d y ! . It follows that 1 − t = P nj =1 1 x − r R x − r + r j r j µ j ( y, ∞ )d y > D n ( x ) . Case 2 : Suppose that the inﬁmum in (11) is attained at β with some β i = 0 for some i ∈ { , . . . , n } and β = 1 − t − P ni =1 β i >

0. For i = 1 , . . . , n , we have q − ( µ i ) > q − t ( µ i ) because t < r i = q − − β − β i ( µ i ) < q − ( µ i ). For i ∈ { , . . . , n } satisfying β i = 0, the ﬁrst-order conditionreads as the ﬁrst equation in (A.23) and gives that r i = q − − β i ( µ i ) + r − x . For i satisfying β i = 0,the ﬁrst-order condition reads as the second equation in (A.23) and gives q − ( µ i ) x − r + r i and r x − ( q − ( µ i ) − r i ) < x , which implies µ i ( −∞ , x − r + r i ] = 1 and r ∈ ∆ n ( x ). Similarly, x = 1 β  x β − (1 − t )( x − r ) + n X j =1 Z x − r + r j r j µ j ( y, ∞ )d y  . Therefore, 1 − t = 1 x − r n X j =1 Z x − r + r j r j µ j ( y, ∞ )d y > D n ( x ) . ase 3 : If the inﬁmum in (11) is attained at some β with β = 1 − t − P ni =1 β i = 0, then from (11) wehave the third equation in (A.23). Deﬁne r i = q − − β − β i ( µ i ), i = 1 , . . . , n . Then r = P ni =1 r i = x and1 − t = n X i =1 β i = n X i =1 µ i ( r i , + ∞ ) = lim r ′ ∈ ∆ n ( x ) r ′ → r x − r ′ n X i =1 Z x − r ′ + r ′ i r ′ i µ i ( y, ∞ )d y > D n ( x ) . In all three cases, 1 − t > D n ( x ). Since D n is decreasing, D − n (1 − t ) x , and thus the dual boundis not greater than the convolution bound.2. For ﬁxed t ∈ [0 , D − n (1 − t ) is not smaller than theconvolution bound. We ﬁrst claim that if quantile functions of µ , . . . , µ n are continuous, then D n is strictly decreasing on (cid:0) −∞ , P ni =1 q − ( µ j ) (cid:1) and is constant 0 on (cid:2)P ni =1 q − ( µ j ) , ∞ (cid:1) . Indeed, for any x < x , we have D n ( x ) = inf r ∈ ∆ n ( x ) ( n X i =1 x − r Z x − r + r i r i µ i ( y, ∞ )d y ) > inf r ∈ ∆ n ( x ) ( n X i =1 x − r Z x − r + r i r i µ i ( y, ∞ )d y ) > inf r ∈ ∆ n ( x ) ( n X i =1 x − r Z x − r + r i r i µ i ( y, ∞ )d y ) = D n ( x ) . We prove that if “=” holds, it must be D n ( x ) = D n ( x ) = 0. Since D n ( x ) = D n ( x ) ∈ [0 , n ] isbounded, the inﬁmum is attained at some r with r x . Because µ j ( · , ∞ ) is decreasing, we have for j = 1 , . . . , n , µ j ( · , ∞ ) is a constant on [ r j , x − r + r j ]. The fact that quantile functions of µ , . . . , µ n arecontinuous implies that these constants can only be 0 or 1, and they cannot be 1 since it is an inﬁmum.Hence for j = 1 , . . . , n , µ j ( y, ∞ ) ≡ y ∈ [ r j , x − r + r j ], which implies D n ( x ) = D n ( x ) = 0. It isstraightforward to check that D n ( x ) = 0 implies x > P ni =1 q − ( µ j ). Thus we prove the claim. One canfurther verify that D n is continuous.Now we continue to prove the main result. For ﬁxed t ∈ [0 , D − n (1 − t ) < P ni =1 q − ( µ j )and D n ( D − n (1 − t )) >

0. As D n is strictly decreasing and continuous on (cid:0) −∞ , P ni =1 q − ( µ j ) (cid:1) , we have D n ( D − n (1 − t )) = 1 − t . Denote by x = D − n (1 − t ) the value of the dual bound. Case 1 : Suppose that the inﬁmum of D n ( x ) is attained at r = ( r , . . . , r n ) ∈ ∆ n ( x ). Its ﬁrst-ordercondition reads as, for any i = 1 , . . . , n , µ i ( r i , ∞ ) + X j = i µ j ( x − r + r j , ∞ ) = 1 x − r n X j =1 Z x − r + r j r j µ j ( y, ∞ )d y = D n ( x ) = 1 − t. Deﬁne β i = µ i ( x − r + r i , ∞ ), i = 1 , . . . , n and β = 1 − t − P ni =1 β i . One can check β i = 1 − β − i ( −∞ , r i ] and β ∈ (1 − t )∆ n because r < x . We have1 − t = 1 x − r n X j =1 Z x − r + r j r j µ j ( y, ∞ )d y = 1 x − r n X j =1 Z − β j − β − β j (1 − u )d q − u ( µ j )= 1 x − r n X j =1 ( x − r + r j ) β j − r j ( β + β j ) + Z − β j − β − β j q − u ( µ j )d u ! = 1 x − r  ( x − r )(1 − t ) − x β + n X j =1 Z − β j − β − β j q − u ( µ j )d u  . Therefore, x = 1 β n X j =1 Z − β j − β − β j q − u ( µ j )d u, which implies that the value of the dual bound x is not smaller than that of the convolution bound. Case 2 : If the inﬁmum of D n ( x ) is attained at some r with r = x , then D n ( x ) = lim r ′ ∈ ∆ n ( x ) r ′ → r n X i =1 x − r ′ Z x − r ′ + r ′ i r ′ i µ i ( y, ∞ )d y = n X i =1 µ i ( r i , ∞ ) = 1 − t. Deﬁne β i = µ i ( r i , ∞ ), i = 1 , . . . , n and β = 0. We have r i = q − − β i ( µ i ), and x = n X i =1 r i = n X i =1 q − − β i ( µ i ) = lim β ′ ∈ ∆ n β ′ → β n X i =1 β ′ Z − β ′ i − β ′ − β ′ i q − u ( µ i )d u > inf β ′ ∈ ∆ n n X i =1 β ′ Z − β ′ i − β ′ − β ′ i q − u ( µ i )d u, which implies that the value of the dual bound x2