[PDF] Large deviations for risk measures in finite mixture models

Abstract

Due to their heterogeneity, insurance risks can be properly described as a mixture of different fixed models, where the weights assigned to each model may be estimated empirically from a sample of available data. If a risk measure is evaluated on the estimated mixture instead of the (unknown) true one, then it is important to investigate the committed error. In this paper we study the asymptotic behaviour of estimated risk measures, as the data sample size tends to infinity, in the fashion of large deviations. We obtain large deviation results by applying the contraction principle, and the rate functions are given by a suitable variational formula; explicit expressions are available for mixtures of two models. Finally, our results are applied to the most common risk measures, namely the quantiles, the Expected Shortfall and the shortfall risk measures.

Full PDF

aa r X i v : . [ q -f i n . R M ] F e b Large deviations for risk measures in ﬁnite mixture models ∗ Valeria Bignozzi † Claudio Macci ‡ Lea Petrella § Abstract

Due to their heterogeneity, insurance risks can be properly described as a mixture of diﬀerentﬁxed models, where the weights assigned to each model may be estimated empirically from asample of available data. If a risk measure is evaluated on the estimated mixture instead of the(unknown) true one, then it is important to investigate the committed error. In this paper westudy the asymptotic behaviour of estimated risk measures, as the data sample size tends toinﬁnity, in the fashion of large deviations. We obtain large deviation results by applying thecontraction principle, and the rate functions are given by a suitable variational formula; explicitexpressions are available for mixtures of two models. Finally, our results are applied to themost common risk measures, namely the quantiles, the Expected Shortfall and the shortfall riskmeasure.

AMS Subject Classiﬁcation.

Primary: 60F10, 91B30. Secondary: 62B10, 62D05.

Keywords: contraction principle, Lagrange multipliers, quantile, entropic risk measure, relativeentropy.

Quantitative risk management for ﬁnancial and insurance companies requires the modelling of ﬁ-nancial positions in terms of random variables on a suitable probability space; in mathematicalterms, this corresponds to identifying a probability law (model) µ on the real line that describesas accurately as possible, the random behaviour of the position. Model risk, that arises from theuncertainty about the model to adopt, has been largely discussed in various area of the litera-ture, because it may impact substantially companies decision making and performance. We candistinguish three main approaches to deal with model uncertainty: 1) the model is not speciﬁedbut directly extrapolated from data via the empirical distribution; 2) a model is selected and itsparameters are estimated from data (e.g. using Maximum Likelihood Estimation); 3) a class ofcandidate models is considered (for instance models suggested by expert opinion) and then one oran average of them is applied. The latter approach is probably the most common one and includesfor instance: the worst-case approach proposed by Gilboa and Schmeidler (1989) in the theory ofutility maximization, where the chosen model is the one providing the most adverse outcome; theBayesian model averaging approach, developed by Raftery et al. (1997) where (posterior) weightsare calculated for each model considering both information arising from data and prior beliefs;the highest posterior approach, where the selected model is the one most favourable according tothe posterior weights. Cairns (2000) provided a general framework for dealing with model and ∗ The support of Gruppo Nazionale per l’Analisi Matematica, la Probabilit`a e le loro Applicazioni (GNAMPA) ofthe Istituto Nazionale di Alta Matematica (INdAM) is acknowledged. † Dipartimento di Statistica e Metodi Quantitativi, Universit`a di Milano Bicocca, Via Bicocca degli Arcimboldi 8,I-20126 Milano, Italia. e-mail: [email protected] ‡ Dipartimento di Matematica, Universit`a di Roma Tor Vergata, Via della Ricerca Scientiﬁca, I-00133 Roma, Italia.e-mail: [email protected] § Dipartimento di Metodi e Modelli per l’Economia, il Territorio e la Finanza, Sapienza Universit`a di Roma, Viadel Castro Laurenziano 9, I-00161 Roma, Italia. e-mail: [email protected] ρ to the ﬁnancial position. The impact of modeluncertainty on risk measurement was discussed among others by Barrieu and Scandolo (2015) andBignozzi and Tsanakas (2016) where diﬀerent measures of model risk are considered.Most of the risk measures generally considered, from both academics and practitioners, are law-invariant that is, univocally determined by the probability law of the random variable. Theserisk measures can then be treated as statistical functionals.While the mathematical theory of risk measures is by now well developed, we refer for in-stance to F¨ollmer and Schied (2016) for an extensive treatment of coherent and convex risk mea-sures, research on the statistical properties of risk measures is fairly recent. The seminal paper byCont et al. (2010) started a new strand in the literature that investigates the statistical propertiesof risk measures in terms of robustness with respect to available data and to diﬀerent model esti-mation procedures. The main diﬀerence between the mathematical and the statistical approachesis that, in the ﬁrst case risk measures are deﬁned on a space of random variables, while in thesecond one, on a space of probability measures. Although, under weak technical assumptions, fora random variable X with probability law µ , we can identify ρ ( X ) and ρ ( µ ), it is important toemphasise that properties of risk measures on random variables and on distributions are diﬀerent.In particular, given two random variables X , Y with distributions µ , ν the convex combination pX + (1 − p ) Y, for p ∈ (0 , pµ + (1 − p ) ν represents a higher risk proﬁle and thus a risk measure should not be convex with respect to mixturesof distributions. Properties of the risk measures with respect to mixture distributions, have beeninvestigated by Acciaio and Svindland (2013). Weber (2006) used such properties to characterisedynamic risk measures, while Ziegel (2016), Bellini and Bignozzi (2015) and Delbaen et al. (2016)used them to study elicitable functionals. Bernardi et al. (2017) presented some results on riskmeasures evaluated on mixtures of Gaussian and Student t distributions.In this contribution we consider risk measures applied to the mixture distribution π µ + . . . + π s µ s , where { µ , . . . , µ s } is a set of s available models, and π , . . . , π s ≥ P sj =1 π j = 1) are theweights assigned to each model. Mixture models are particularly relevant when a single model is notsuﬃcient to fully describe the data. They represent a ﬂexible approach for modelling heterogeneousdata and to carry out cluster analysis. Further, mixture models represent a ductile way to modelunknown distributional shapes. Such situations are quite common in insurance where often a mix ofsmall, medium and large size claims occurs; we refer the interested reader to Klugman et al. (2012)for a full treatment of loss modelling in actuarial science. Bernardi et al. (2012) proposed ﬁnitemixtures of Skew Normal distributions to properly characterise insurance data, while Lee and Lin(2010) suggested a mixture of Erlang distributions. In a statistical framework mixture models havea variety of applications; we refer to McLachlan and Peel (2004) for an extensive treatment of thetopic. 2hroughout this paper the models µ , . . . , µ s are assumed to be ﬁxed, and the weights π , . . . , π s are estimated empirically from independent samples. In an insurance framework, we can assumethat each model represents the loss proﬁle of a customer (or a class of customers) and the weights areestimated registering the relative frequency of claims occurring for each model. Then we considerthe sequence of empirical risk measures n ρ ( P sj =1 ˆ π n ( j ) µ j ) : n ≥ o , where the weight estimatorsˆ π n (1) , . . . , ˆ π n ( s ) concern the empirical law of i.i.d. random variables { X , . . . , X n } with distribution π = ( π , . . . , π s ) (see (1) below).In this paper we prove large deviation results for the empirical risk measures. The theory oflarge deviations gives an asymptotic computation of small probabilities on an exponential scale(see e.g. Dembo and Zeitouni, 1998 as a reference on this topic). The large deviation principlesare obtained by applying the contraction principle; so the rate functions are given by a suitablevariational formula. We use the method of the Lagrange multipliers, and explicit expressions areavailable for s = 2. We then apply our results to the most common risk measures, namely thequantiles (also known as Value-at-Risk in the risk management literature), the Expected Shortfall(ES) and the shortfall risk measure.A diﬀerent approach for large deviation analysis may be the use of precise large deviationtechniques which are beyond the purpose of the paper; among others, a possible reference for theinterested reader is F´eray et al. (2016).Our work was inspired by Weber (2007), where the author considered the empirical risk measures { ρ (ˆ µ n ) : n ≥ } , and ˆ µ n := 1 n n X i =1 δ Y i is the empirical law of i.i.d. random variables { Y , . . . , Y n } having (unknown) distribution µ withbounded support. The main goal of that paper is to investigate coherent and convex risk mea-sures that are continuous on compacts. This condition yields the large deviation principle of { ρ (ˆ µ n ) : n ≥ } by applying the contraction principle (see Proposition 2.1 and Corollary 2.1 inWeber, 2007).The paper is organised as follows. Section 2 gathers some preliminaries on large deviations andtheir applications to our framework with ﬁnite mixtures. In Section 3 we present the main resultsof the paper, while Section 4 presents some examples for the most common risk measures used inpractice and in the literature. In this section we recall some preliminaries on large deviations and a large deviation principle fora sequence of estimators (see Proposition 2.1).

A sequence of random variables { W n : n ≥ } taking values on a topological space W satisﬁesthe large deviation principle (LDP for short) with rate function I : W → [0 , ∞ ] if I is a lowersemi-continuous function,lim inf n →∞ n log P ( W n ∈ O ) ≥ − inf w ∈ O I ( w ) for all open sets O and lim sup n →∞ n log P ( W n ∈ C ) ≤ − inf w ∈ C I ( w ) for all closed sets C. I is said to be good if all its level sets {{ w ∈ W : I ( w ) ≤ η } : η ≥ } are compact.Finally we also recall the contraction principle (see e.g. Theorem 4.2.1 in Dembo and Zeitouni,1998): let Y be a topological space, and let f : W → Y be a continuous function; then, if { W n : n ≥ } satisﬁes the LDP with good rate function I , and Y n := f ( W n ) (for all n ≥ { Y n : n ≥ } satisﬁes the LDP with good rate function J deﬁned by J ( y ) := inf { I ( w ) : w ∈ W , f ( w ) = y } . The LDP for real valued random variables is used to obtain asymptotic evaluations for thelogarithm of tail probabilities; indeed, for a wide class of cases, we havelog P ( W n > x ) ∼ − nI ( x )at least for x large enough to have I ( x ) = inf w>x I ( w ) (we use the symbol ∼ to mean that the ratiotends to 1 as n → ∞ ). ρ ( µ ) when µ is a mixture We deﬁne a law-invariant risk measure as a map ρ : P ( R ) → R , that assigns to every probability measure µ ∈ P ( R ) on the real line a real number ρ ( µ ). Such a value,is generally used to summarise the riskiness of the model µ and can be adopted to calculate solvencycapital requirements. In the present contribution, we focus on probability distributions that ariseas mixtures π µ + . . . + π s µ s of some ﬁxed models µ , . . . , µ s with weights π = ( π , . . . , π s ) ∈ Σ s where Σ s := { ( p , . . . , p s ) : p , . . . , p s ≥ , p + · · · + p s = 1 } is the simplex; we are then interested in computing ρ (cid:16)P sj =1 π j µ j (cid:17) . In mixture models used formodeling insurance data, it is often the case that the weights π , . . . , π s are unknown and estimatedfrom a set of n available data by ˆ π n = (ˆ π n (1) , . . . , ˆ π n ( s )), see for instance Lee et al. (2012). In orderto estimate the error committed in computing the estimated risk measure ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) insteadof the correct one ρ (cid:16)P sj =1 π ( j ) µ j (cid:17) , we employ the theory of large deviations. In particular, weconsider the case where the weights are estimated empirically asˆ π n ( j ) := 1 n n X i =1 δ X i = j (for all j ∈ { , . . . , s } ) , (1)where { X , . . . , X n } are i.i.d. random variables with distribution π = ( π , . . . , π s ). It is well knownthat the sequence of empirical measures (ˆ π n ) n converges P -a.s. to π , and that it satisﬁes the LDP(see e.g. Theorem 2.1.10 in Dembo and Zeitouni, 1998). Therefore, by applying the contractionprinciple (see Theorem 4.2.1 in Dembo and Zeitouni, 1998), we obtain the LDP stated in thefollowing proposition. Proposition 2.1.

Let π = ( π , . . . , π s ) ∈ Σ s . Moreover assume that the function Σ s ∋ ( p , . . . , p s ) ρ  s X j =1 p j µ j  (2)4 s continuous. Then n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) : n ≥ o satisﬁes the LDP (as n → ∞ ) with good ratefunction H ρ, h π,µ i deﬁned by H ρ, h π,µ i ( r ) := inf  s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s , ρ  s X j =1 p j µ j  = r  . (3) Remark 2.1 (Relative entropy and Sanov’s Theorem) . The quantity E π [ dpdπ log dpdπ ] = P sj =1 p j log p j π j in (3) is the relative entropy of a general probability measure p = ( p , . . . , p s ) on the state space { , . . . , s } , with respect to the probability measure π = ( π , . . . , π s ) that gives the actual (but un-known) weights of the mixture model. Large deviation rate functions are indeed often expressed interms of relative entropy; see e.g. the discussion in Varadhan (2003). The rate function is thusobtained minimising the relative entropy under a constraint on the risk measure. Remark 2.2 (The set S π and the value r ) . If π j = 0 for some j ∈ { , . . . , s } , then we have p j log p j π j = (cid:26) if p j = 0 ∞ if p j ∈ (0 , so, in some sense, the index j is negligible. Then we should consider the set S π := { i ∈ { , . . . , s } : π i > } instead of { , . . . , s } ; however, with a slight abuse of notation, throughout the paper wealways refer to { , . . . , s } (and its cardinality s ) because we can always rearrange the notation inorder to have S π = { , . . . , s } . We also remark that H ρ, h π,µ i ( r ) uniquely vanishes at r = r , where r := ρ  s X j =1 π j µ j  . Thus we can say that, for every δ > , under the hypotheses of Proposition 2.1, the probability P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ  s X j =1 ˆ π n ( j ) µ j  − r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ δ  decays as e − nh δ , where h δ := inf (cid:8) H ρ, h π,µ i ( r ) : | r − r | ≥ δ (cid:9) > , as n → ∞ . In this section we provide, when possible, an explicit expression of the variational formula in (3).Note that in general the constraint ρ (cid:16)P sj =1 p j µ j (cid:17) = r cannot be written explicitly in terms of the p j ’s; for this reason we introduce the next Condition 3.1 that requires a sort of linear dependence ofthe risk measures with respect to the mixture weights. As we shall see in Theorem 3.1, this allowsus to handle the variational formula in (3) with the method of Lagrange multipliers. Condition3.1 does not seem to be restrictive, it is indeed satisﬁed by many of the risk measures used byacademics and practitioners. Condition 3.1.

The function in (3) can be written as H ρ, h π,µ i ( r ) := inf  s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s , s X j =1 p j Ψ ρ ( µ j , r ) = 0  , (4) for some (strictly) decreasing functions Ψ ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · ) . Moreover, for all i ∈ { , . . . , s } ,there exists a unique r (0) i such that Ψ ρ (cid:16) µ i , r (0) i (cid:17) = 0 . ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · ) are increasing functions (instead of de-creasing); in such a case we can reduce to Condition 3.1 (namely the functions Ψ ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · )are decreasing) by a change of sign.We will see in Section 4 that Condition 3.1 is fulﬁlled by some popular risk measures, such as thequantiles, the mean and the class of convex shortfall risk measures introduced by F¨ollmer and Schied(2002). The Expected Shortfall satisﬁes this condition only under some extra requirements. A sim-ilar condition recently appeared in the literature about elicitable risk measures under the nameof Convex Level Sets (CxLS). A risk measure has CxLS if, given ρ ( µ ) = · · · = ρ ( µ s ) = r , then ρ ( P sj =1 p j µ j ) = r . Clearly, if ρ ( µ ) = · · · = ρ ( µ s ) = r , the CxLS property implies our Condition3.1 with Ψ ρ ( µ j , r ) := ρ ( µ j ) − r. A full characterization of convex risk measures satisfying the CxLS property is provided in Delbaen et al.(2016).

Remark 3.1 (Consequences of Condition 3.1 for r ) . If Condition 3.1 holds, then we have s X j =1 π j Ψ ρ ( µ j , r ) = 0 . (5) Moreover, if we set r ρ := min n r (0) i : i ∈ { , . . . , s } o and r ρ := max n r (0) i : i ∈ { , . . . , s } o , we have r ρ ≤ r ρ . Then we can distinguish two cases (see parts (i) and (ii) in the next Theorem 3.1): • r ρ = r ρ =: ˆ r ρ , which occurs if and only if r (0)1 = · · · = r (0) s = ˆ r ρ ; in this case we have r = ˆ r ρ ,and the estimators n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) ; n ≥ o are constantly equal to r ; • r ρ < r ρ ; in this case we have r ∈ ( r ρ , r ρ ) , and the estimators n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) ; n ≥ o take values in [ r ρ , r ρ ] .The ﬁrst case always occurs if s = 1 . Now we are ready to present Theorem 3.1. In general we only have an explicit expression of H ρ, h π,µ i for the case s = 2; see Remark 3.2 and Remark 3.3. The case with s = ∞ will be discussedin Remark 3.4. Theorem 3.1.

Consider the same hypotheses of Proposition 2.1. Assume that Condition 3.1 holds,and let r ρ and r ρ be as in Remark 3.1.(i) If r ρ = r ρ , then H ρ, h π,µ i ( r ) = ( if r = r (0)1 = · · · = r (0) s ∞ otherwise . (ii) If r ρ < r ρ , then H ρ, h π,µ i ( r ) =  − log (cid:16)P sj =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) (cid:17) if r ∈ ( r ρ , r ρ ) − log P j : r (0) j = r ρ π j if r = r ρ − log P j : r (0) j = r ρ π j if r = r ρ ∞ if r / ∈ [ r ρ , r ρ ] , where λ ∗ ( r ) is such that P sj =1 π j Ψ ρ ( µ j , r ) e − λ ∗ ( r )Ψ ρ ( µ j ,r ) P sj =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) = 0 . (6)6 roof. We start with the proof of the statement (i). For r = r (0)1 = · · · = r (0) s we have H ρ, h π,µ i ( r ) = inf  s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s  = 0(the inﬁmum is attained by choosing ( p , . . . , p s ) = ( π , . . . , π s )); on the contrary, for r = r (0)1 = · · · = r (0) s , we have H ρ, h π,µ i ( r ) = ∞ because the condition P sj =1 p j Ψ ρ ( µ j , r ) = 0 fails for everychoice of ( p , . . . , p s ) ∈ Σ s (in fact the values { Ψ ρ ( µ j , r ) : j ∈ { , . . . , s }} are all positive if r r (0)1 = · · · = r (0) s ), and therefore we have the inﬁmumover the empty set.Now we concentrate the attention on the proof of the statement (ii). For r / ∈ [ r ρ , r ρ ] we have thesame argument of the proof of the statement (i), for the case r = r (0)1 = · · · = r (0) s . For r ∈ ( r ρ , r ρ )we introduce the function L ( p , . . . , p s , λ ) = s X j =1 p j log p j π j + λ  s X j =1 p j Ψ ρ ( µ j , r )  and, by the Lagrange multipliers method, ( p , . . . , p s ) attains the inﬁmum in (4) if it is the solutionof the system (cid:26) log p i π i + 1 + λ Ψ ρ ( µ i , r ) = 0 , for all i ∈ { , . . . , s } P sj =1 p j Ψ ρ ( µ j , r ) = 0;then the minimiser ( p , . . . , p s ) = ( p ( r ) , . . . , p s ( r )), which attains the inﬁmum in (4), is deﬁned by p i ( r ) := π i e − λ ∗ ( r )Ψ ρ ( µ i ,r ) P sj =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) for all i ∈ { , . . . , s } , (7)where λ ∗ ( r ) is such that (6) holds. In conclusion, for r ∈ ( r ρ , r ρ ), we have H ρ, h π,µ i ( r ) = s X j =1 p j ( r ) log p j ( r ) π j = s X j =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) log e − λ ∗ ( r )Ψ ρ ( µ j ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = − λ ∗ ( r ) s X j =1 π j Ψ ρ ( µ j , r ) e − λ ∗ ( r )Ψ ρ ( µ j ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) | {z } =0 by (6) − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! , and therefore H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! . (8)Finally the cases r ∈ { r ρ , r ρ } . The minimisers ( p , . . . , p s ) = ( p ( r ) , . . . , p s ( r )), which attain theinﬁmum in (4), are p i ( r ρ ) =  π i / ( P j : r (0) j = r ρ π j ) if i ∈ { j : r (0) j = r ρ } i / ∈ { j : r (0) j = r ρ } and p i ( r ρ ) =  π i / ( P j : r (0) j = r ρ π j ) if i ∈ { j : r (0) j = r ρ } i / ∈ { j : r (0) j = r ρ } , H ρ, h π,µ i ( r ) = s X j =1 p j ( r ) log p j ( r ) π j =  − log P j : r (0) j = r ρ π j if r = r ρ − log P j : r (0) j = r ρ π j if r = r ρ . Remark 3.2 (The rate function H ρ, h π,µ i is not explicit) . Obviously Theorem 3.1 does not providean explicit expression of the rate function because there is not an explicit expression of λ ∗ ( r ) .However H ρ, h π,µ i ( r ) = 0 ; so that ( p ( r ) , . . . , p s ( r )) = ( π , . . . , π s ) and, by taking into account (7) , we obtain λ ∗ ( r ) = 0 . Therefore we recover the known equality H ρ, h π,µ i ( r ) = 0 by considering (8) and λ ∗ ( r ) = 0 . Finally we also remark that (6) and λ ∗ ( r ) = 0 yield (5) . Remark 3.3 (On Theorem 3.1(ii) with s = 2) . We remark that, if s = 2 , then r ρ < r ρ if and onlyif r (0)1 = r (0)2 . Here we take r ∈ ( r ρ , r ρ ) = (cid:16) r (0)1 ∧ r (0)2 , r (0)1 ∨ r (0)2 (cid:17) . Then we have π Ψ ρ ( µ , r ) e − λ ∗ ( r )Ψ ρ ( µ ,r ) + π Ψ ρ ( µ , r ) e − λ ∗ ( r )Ψ ρ ( µ ,r ) = 0 by (6) (with s = 2 ), and therefore e − λ ∗ ( r )Ψ ρ ( µ ,r ) (cid:16) π Ψ ρ ( µ , r ) + π Ψ ρ ( µ , r ) e − λ ∗ ( r )(Ψ ρ ( µ ,r ) − Ψ ρ ( µ ,r )) (cid:17) = 0; this yields π Ψ ρ ( µ , r ) + π Ψ ρ ( µ , r ) e − λ ∗ ( r )(Ψ ρ ( µ ,r ) − Ψ ρ ( µ ,r )) = 0 , and we get e − λ ∗ ( r )(Ψ ρ ( µ ,r ) − Ψ ρ ( µ ,r )) = − π Ψ ρ ( µ , r ) π Ψ ρ ( µ , r ) ; thus λ ∗ ( r ) = − ρ ( µ , r ) − Ψ ρ ( µ , r ) log (cid:18) − π Ψ ρ ( µ , r ) π Ψ ρ ( µ , r ) (cid:19) . (9) We remark that, for r ∈ ( r ρ , r ρ ) , we have Ψ ρ ( µ , r ) , Ψ ρ ( µ , r ) = 0 and Ψ ρ ( µ , r )Ψ ρ ( µ , r ) < thus, in particular, Ψ ρ ( µ , r ) − Ψ ρ ( µ , r ) = 0 . Finally, by (8) (with s = 2 ) and (9) , we get H ρ, h π,µ i ( r ) = − log  X h =1 π h (cid:18) − π Ψ ρ ( µ , r ) π Ψ ρ ( µ , r ) (cid:19) Ψ ρ ( µh,r )Ψ ρ ( µ ,r ) − Ψ ρ ( µ ,r )  . Remark 3.4 (On Theorem 3.1 with s = ∞ ) . It is possible to present a version of Theorem 3.1 with s = ∞ , namely for the case of countable mixture models. The statement and the proof can be easilyadapted and we omit the details. However we remark that Condition 3.1 and Remark 3.1 haveto be suitably changed; in fact we should require that the quantities r ρ := min n r (0) i : i ≥ o and r ρ := max n r (0) i : i ≥ o are well-deﬁned (this is not guaranteed as happens for the case s < ∞ ). Finally we are also interested in the local comparison between rate functions around the pointswhere they uniquely vanish; in fact the larger H ρ, h π,µ i is around r (except r ), the faster is theconvergence of estimators to r (as n → ∞ ). We also recall that, under suitable hypotheseswhich guarantee the existence of the second derivative of H ρ, h π,µ i ( r ) computed at r = r , the more H ′′ ρ, h π,µ i ( r ) is large, the more H ρ, h π,µ i is large around r (except r ). An expression for H ′′ ρ, h π,µ i ( r )is given in the next proposition. 8 roposition 3.2. Consider the same hypotheses and notation of Theorem 3.1(ii). Moreover as-sume that Ψ ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · ) are continuously diﬀerentiable functions (at least in a neighbor-hood of r ). Then, if we consider the notation Ψ ′ ρ ( µ h , r ) := ddr Ψ ρ ( µ h , r ) , we have H ′′ ρ, h π,µ i ( r ) = (cid:0)P sh =1 π h Ψ ′ ρ ( µ h , r ) (cid:1) Var π [Ψ ρ ( µ · , r )] , where (the last equality holds by (5) ) Var π [Ψ ρ ( µ · , r )] := s X h =1 π h Ψ ρ ( µ h , r ) − s X h =1 π h Ψ ρ ( µ h , r ) ! = s X h =1 π h Ψ ρ ( µ h , r ) . Proof.

For r ∈ ( r ρ , r ρ ) we have H ′ ρ, h π,µ i ( r ) = P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) [ λ ′∗ ( r )Ψ ρ ( µ h , r ) + λ ∗ ( r )Ψ ′ ρ ( µ h , r )] P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = λ ′∗ ( r ) P sh =1 π h Ψ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) | {z } =0 by (6) + λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ;thus, since λ ∗ ( r ) = 0, we get H ′ ρ, h π,µ i ( r ) = λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = 0 . Moreover, again for r ∈ ( r ρ , r ρ ), we have H ′′ ρ, h π,µ i ( r ) = λ ′∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) + λ ∗ ( r ) ddr P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! ;thus, by taking into account again λ ∗ ( r ) = 0, we obtain H ′′ ρ, h π,µ i ( r ) = λ ′∗ ( r ) s X h =1 π h Ψ ′ ρ ( µ h , r ) . (10)We conclude by computing λ ′∗ ( r ) by means of the implicit function theorem. By (6) we considerthe function ∆( r, λ ) := P sj =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ;the partial derivatives of ∆ are∆ r ( r, λ ) = 1( P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ) ·  s X j =1 π j Ψ ′ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) (1 − λ Ψ ρ ( µ j , r ))   s X j =1 π j e − λ Ψ ρ ( µ j ,r )  + λ  s X j =1 π j Ψ ′ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r )   s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r )  λ ( r, λ ) = 1( P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ) ·  −  s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r )   s X j =1 π j e − λ Ψ ρ ( µ j ,r )  +  s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r )   ;thus, by taking into account (5) and λ ∗ ( r ) = 0, we have∆ r ( r , λ ∗ ( r )) = s X j =1 π j Ψ ′ ρ ( µ j , r ) and ∆ λ ( r , λ ∗ ( r )) = − s X j =1 π j Ψ ρ ( µ j , r ) , and the implicit function theorem yields λ ′∗ ( r ) = − ∆ r ( r, λ )∆ λ ( r, λ ) (cid:12)(cid:12)(cid:12)(cid:12) ( r,λ )=( r ,λ ∗ ( r )) = P sj =1 π j Ψ ′ ρ ( µ j , r ) P sj =1 π j Ψ ρ ( µ j , r ) . We conclude the proof by combining this equality and (10).

In this section we consider some examples of risk measures satisfying Condition 3.1. The ﬁrstexample concerns risk measures that are linearly dependent with respect to the weights, and wepresent two speciﬁc cases. Other examples consider quantiles and the class of shortfall risk measures,that includes the entropic risk measures as a special case. We conclude the section with twoexamples: one for an insurance application with s = 2, and one where we obtain explicit expressionsfor s = 3. In view of what follows we consider the notation F µ for the distribution functionassociated with the law µ , namely F µ ( x ) := µ (( −∞ , x ]) , for all x ∈ R . We remark that, when we deal with a ﬁnite mixture P sj =1 p j µ j of some laws µ , . . . , µ s (for some( p , . . . , p s ) ∈ Σ s ), we have F P sj =1 p j µ j = P sj =1 p j F µ j . Example 4.1 (Linear dependence with respect to the weights) . We assume that the function (2) satisﬁes the following condition: ρ  s X j =1 p j µ j  = s X j =1 p j ρ ( µ j ) for all ( p , . . . , p s ) ∈ Σ s . Obviously we have a continuous function. In this case one has Ψ ρ ( µ i , r ) = ρ ( µ i ) − r which yields r (0) i = ρ ( µ i ) and r = P sj =1 π j ρ ( µ j ) ; moreover r ρ := min { ρ ( µ i ) : i ∈ { , . . . , s }} and r ρ := max { ρ ( µ i ) : i ∈ { , . . . , s }} . ow we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if ρ ( µ ) = ρ ( µ ) . By (8) , for r ∈ ( r ρ , r ρ ) , weget H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = − rλ ∗ ( r ) − log s X h =1 π h e − λ ∗ ( r ) ρ ( µ h ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log  X h =1 π h (cid:18) − π ( ρ ( µ ) − r ) π ( ρ ( µ ) − r ) (cid:19) ρ ( µh ) − rρ ( µ − ρ ( µ  ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 . Finally, by Proposition3.2 (and after some computations where we take into account that r = P sj =1 π j ρ ( µ j ) ), we get H ′′ ρ, h π,µ i ( r ) = 1 P sh =1 π h ( ρ ( µ h ) − r ) = 1 P sh =1 π h ρ ( µ h ) − r . Here we brieﬂy present two particular cases concerning Example 4.1. • The expected value (when µ , . . . , µ s are probability measures of integrable random variables);in fact we have Z R x s X j =1 p j µ j ( dx ) = s X j =1 p j Z R xµ j ( dx ) . • The Expected Shortfall ES α , for α ∈ (0 , µ , . . . , µ s have the same α -quantile, namelywhen F − µ ( α ) = · · · = F − µ s ( α ) =: r α . We recall thatES α ( µ ) := 11 − α Z ∞ F − µ ( α ) xµ ( dx ) , and that we have (cid:16)P sj =1 p j F µ j (cid:17) − ( α ) = r α . ThereforeES α  s X j =1 p j µ j  = 11 − α Z ∞ ( P sj =1 p j F µj ) − ( α ) x s X j =1 p j µ j ( dx )= s X j =1 p j − α Z ∞ r α xµ j ( dx ) = s X j =1 p j ES α ( µ j ) . We conclude with the ﬁnal examples.

Example 4.2 (Quantiles) . Let us consider α ∈ (0 , and strictly increasing and continuous dis-tribution functions F µ , . . . , F µ s on the same interval. We assume that the function (2) is deﬁnedby Σ s ∋ ( p , . . . , p s ) ρ  s X j =1 p j µ j  :=  s X j =1 p j F µ j  − ( α ) . This function is continuous (see Appendix for details). In this case one has Ψ ρ ( µ i , r ) = α − F µ i ( r )11 hich yields r (0) i = F − µ i ( α ) and r = (cid:16)P sj =1 π j F µ j (cid:17) − ( α ) ; moreover r ρ := min (cid:8) F − µ i ( α ) : i ∈ { , . . . , s } (cid:9) and r ρ := max (cid:8) F − µ i ( α ) : i ∈ { , . . . , s } (cid:9) . Now we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if F − µ ( α ) = F − µ ( α ) . By (8) , for r ∈ ( r ρ , r ρ ) ,we get H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = αλ ∗ ( r ) − log s X h =1 π h e λ ∗ ( r ) F µh ( r ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log  X h =1 π h (cid:18) − π ( α − F µ ( r )) π ( α − F µ ( r )) (cid:19) α − Fµh ( r ) Fµ r ) − Fµ r )  ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 because P sj =1 π j F µ j ( r ) = α . Finally, by Proposition 3.2 (and after some computations where we take into account again that P sj =1 π j F µ j ( r ) = α ), we get H ′′ ρ, h π,µ i ( r ) = (cid:0)P sh =1 π h F ′ µ i ( r ) (cid:1) P sh =1 π h ( α − F µ h ( r )) = (cid:0)P sh =1 π h F ′ µ i ( r ) (cid:1) P sh =1 π h F µ h ( r ) − α . Example 4.3 (Shortfall risk measures) . We recall some preliminaries (see F¨ollmer and Schied(2002)). Given a loss function ℓ : R → R (that is a convex, increasing and not identically constantfunction) and an interior point x in the range of ℓ , a shortfall risk measure is deﬁned by ρ ( µ ) := inf (cid:26) m ∈ R : Z R ℓ ( x − m ) µ ( dx ) ≤ x (cid:27) ; moreover it is the unique solution m to the equation Z R ℓ ( x − m ) µ ( dx ) = x . We assume that the function (2) is deﬁned by Σ s ∋ ( p , . . . , p s ) ρ  s X j =1 p j µ j  := r, where s X j =1 p j Z R ℓ ( x − r ) µ j ( dx ) = x . The continuity of this function can be checked by adapting the proof in the Appendix. In this caseone has Ψ ρ ( µ i , r ) = Z R ℓ ( x − r ) µ i ( dx ) − x . From now on, in order to have explicit results, we continue our analysis for the class of entropicrisk measures, that is the case where we have the loss function ℓ ( x ) = e θx , for θ > , and x = 1 .We can check the following equalities: ρ ( µ ) = 1 θ log (cid:18)Z R e θx µ ( dx ) (cid:19) ;12 he function (2) becomes Σ s ∋ ( p , . . . , p s ) ρ  s X j =1 p j µ j  := 1 θ log Z R e θx s X j =1 p j µ j ( dx ); (so we have Σ s ∋ ( p , . . . , p s ) θ log (cid:16)P sj =1 p j e θρ ( µ j ) (cid:17) , which is a continuous function); thefunction Ψ ρ ( µ i , r ) can be rewritten as Ψ ρ ( µ i , r ) = e θρ ( µ i ) − e θr , which yields r (0) i = ρ ( µ i ) and r = θ log (cid:16)P sj =1 π j e θρ ( µ j ) (cid:17) ; moreover r ρ := min { ρ ( µ i ) : i ∈ { , . . . , s }} and r ρ := max { ρ ( µ i ) : i ∈ { , . . . , s }} . Now we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if ρ ( µ ) = ρ ( µ ) . By (8) , for r ∈ ( r ρ , r ρ ) , weget H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = − e θr λ ∗ ( r ) − log s X h =1 π h e − λ ∗ ( r ) e θρ ( µi ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log  X h =1 π h − π ( e θρ ( µ ) − e θr ) π ( e θρ ( µ ) − e θr ) ! eθρ ( µh ) − eθreθρ ( µ − eθρ ( µ  ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 because e θr = P sj =1 π j e θρ ( µ j ) .Finally, by Proposition 3.2 (and after some computations where we take into account again that e θr = P sj =1 π j e θρ ( µ j ) ), we get H ′′ ρ, h π,µ i ( r ) = ( θe θr ) P sh =1 π h ( e θρ ( µ h ) − e θr ) = ( θe θr ) P sh =1 π h e θρ ( µ h ) − e θr . Example 4.4 (An insurance example with s = 2) . In actuarial science mixture distributions areparticularly relevant for modeling diﬀerent claim sizes. Consider, for instance, a car insurancecontext where individuals are grouped into s categories depending on their accident history; weassume for convenience that s = 2 . Each group claim distribution µ j may be modeled using anexponential distribution with parameter λ j , j ∈ { , } ; thus F µ j ( x ) = 1 − e − λ j ( x ) , with x > andthe probability of arrival of a claim in group j is π j > , with π + π = 1 . We assume that the λ j ’s are given and without loss of generality λ < λ (for λ = λ we ﬁnd the usual exponentialdistribution). Instead the π j ’s are estimated empirically from a sample of n claims; denoting X i arandom variable taking values or depending on whether claim i belongs to the group j , we obtain X i = j with probability π j and ˆ π n ( j ) = n P ni =1 δ X i = j for all j ∈ { , } . In this example, we areinterested in understanding what happens to ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) as n → ∞ and the risk measure ρ isquantile at level α ∈ (0 , , also known as Value-at-Risk ( VaR α ) in the risk management literature.We recall that for a model µ with continuous and strictly increasing distribution function F , wehave ρ ( µ ) = F − ( α ) , therefore we have ρ ( µ j ) = − λ j log(1 − α ) , j ∈ { , } . rom Example 4.2, we know that Ψ ρ ( µ j , r ) = α − F µ j ( r ) = α − e − λ j r , and r ρ = r (0)1 = − λ log(1 − α ) > − λ log(1 − α ) = r (0)2 = r ρ . From Remark 3.3, for r ∈ ( r ρ , r ρ ) we easily ﬁnd λ ∗ ( r ) = − e − λ r − e − λ r log (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) and the weights p = ( p , p ) in (7) which attain the inﬁmum in (4) are given by p j ( r ) = π j e − λ ∗ ( r )( α − e − λj r ) P j =1 π j e − λ ∗ ( r )( α − e − λj r ) , j ∈ { , } . We then obtain the rate function: H ρ,<π,µ> ( r ) = − log  π (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) α − e − λ re − λ r − e − λ r + π (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) α − e − λ re − λ r − e − λ r  . We refer the interested reader to Lee et al. (2012) for a more detailed analysis of the use of expo-nential mixture models in insurance.

Example 4.5 (A speciﬁc example with s = 3) . We refer to Condition 3.1 with s = 3 and, forsome a > (and a suitable strictly decreasing function Ψ ), we set Ψ ρ ( µ i , r ) := Ψ( r ) + ( i − a ( for all i ∈ { , , } ); (we remark that we are in this situation when we deal with Example 4.1, with s = 3 ; in such a casewe have Ψ( r ) = ρ ( µ ) − r , ρ ( µ ) = ρ ( µ ) + a , and ρ ( µ ) = ρ ( µ ) + 2 a ).We recall that r ρ ≤ r (0)1 < r (0)2 < r (0)3 ≤ r ρ , where r (0) i = Ψ − (( i − a ) (for all i ∈ { , , } ). Moreover, after some computations, (5) with s = 3 yields r = Ψ − ( − a ( π + 2 π )) . Now we compute the rate function H ρ, h π,µ i in Theorem 3.1(ii). We have H ρ, h π,µ i (Ψ − (0)) = − log π and H ρ, h π,µ i (Ψ − ( − a )) = − log π , which concern the cases r = r ρ and r = r ρ . In what follows we take r ∈ ( r ρ , r ρ ) = (Ψ − (0) , Ψ − ( − a )) .Firstly (6) yields e − λ ∗ ( r )Ψ( r ) (cid:16) π Ψ( r ) + π (Ψ( r ) + a ) e − λ ∗ ( r ) a + π (Ψ( r ) + 2 a ) e − λ ∗ ( r ) a (cid:17) = 0 , and therefore e − λ ∗ ( r ) a = − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) ;14 e remark that π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a ) ≥ because Ψ( r ) < and Ψ( r ) + 2 a > ,and q π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a ) ≥ | π (Ψ( r ) + a ) | . Thus we easily obtain λ ∗ ( r ) and the rate function expression is H ρ, h π,µ i ( r ) = − log  X j =1 π j − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) ! Ψ ρ ( µ j ,r ) /a  . A ﬁnal remark concerns the condition λ ∗ ( r ) = 0 (see Remark 3.2). The above formulas yield − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) = 1 and, after some computations, we get π Ψ( r ) + π (Ψ( r ) + a ) + π (Ψ( r ) + 2 a ) = 0 which recovers (5) (with s = 3 ). Acknowledgements.

The authors are grateful to anonymous reviewers for their careful reportswhich highly improved the paper.

References

Acciaio, B. and G. Svindland (2013). Are law-invariant risk functions concave on distributions?

Dependence Modeling 1 , 54–64.Barrieu, P. and G. Scandolo (2015). Assessing ﬁnancial model risk.

European Journal of OperationalResearch 242 (2), 546–556.Bellini, F. and V. Bignozzi (2015). On elicitable risk measures.

Quantitative Finance 15 (5), 725–733.Bernardi, M., A. Maruotti, and L. Petrella (2012). Skew mixture models for loss distributions: aBayesian approach.

Insurance: Mathematics and Economics 51 (3), 617–623.Bernardi, M., A. Maruotti, and L. Petrella (2017). Multiple risk measures for multivariate dynamicheavy-tailed models.

Journal of Empirical Finance 43 , 1–32.Bignozzi, V. and A. Tsanakas (2016). Model uncertainty in risk capital measurement.

The Journalof Risk 18 (3), 1–24.Cairns, A. J. (2000). A discussion of parameter and model uncertainty in insurance.

Insurance:Mathematics and Economics 27 (3), 313–330.Cont, R., R. Deguest, and G. Scandolo (2010). Robustness and sensitivity analysis of risk mea-surement procedures.

Quantitative Finance 10(6) , 593–606.Delbaen, F., F. Bellini, V. Bignozzi, and J. F. Ziegel (2016). Risk measures with the CxLS property.

Finance and Stochastics 20 (2), 433–453.Dembo, A. and O. Zeitouni (1998).

Large Deviations Techniques and Applications (2nd ed.).Springer. 15´eray, V., P.-L. M´eliot, and A. Nikeghbali (2016).

Mod- φ Convergence. Normality Zones andPrecise Deviations . Springer.F¨ollmer, H. and A. Schied (2002) Convex measures of risk and trading constraints.

Finance andstochastic 6 (4), 429–447.F¨ollmer, H. and A. Schied (2016).

Stochastic Finance: An Introduction in Discrete Time . WalterDe Gruyter.Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with non-unique prior.

Journal ofMathematical Economics 18 (2), 141–153.Klugman, S. A., H. H. Panjer, and G. E. Willmot (2012).

Loss Models: From Data to Decisions .John Wiley & Sons.Lee, D., Li, W. K. and T.S.T. Wong (2012). Modeling insurance claims via a mixture exponen-tial model combined with peaks-over-threshold approach.

Insurance: Mathematics and Eco-nomics 51 (3), 538–550.Lee, S. C. and X. S. Lin (2010). Modeling and evaluating insurance losses via mixtures of Erlangdistributions.

North American Actuarial Journal 14 (1), 107–130.McLachlan, G. and D. Peel (2004).

Finite Mixture Models . John Wiley & Sons.Pesaran, M. H., C. Schleicher, and P. Zaﬀaroni (2009). Model averaging in risk management withan application to futures markets.

Journal of Empirical Finance 16 (2), 280–305.Raftery, A. E., D. Madigan, and J. A. Hoeting (1997). Bayesian model averaging for linear regressionmodels.

Journal of the American Statistical Association 92 (437), 179–191.Varadhan, S. R. S. (2003). Large deviations and entropy. In

Entropy , Princeton Ser. Appl. Math.,pp. 199–214. Princeton University Press.Weber, S. (2006). Distribution-invariant risk measures, information, and dynamic consistency.

Mathematical Finance 16 (2), 419–441.Weber, S. (2007). Distribution-invariant risk measures, entropy, and large deviations.

Journal ofApplied Probability 44 (1), 16–40.Ziegel, J. F. (2016). Coherence and elicitability.

Mathematical Finance 26 (4), 901–918.

Appendix: The continuity of ρ (cid:16)P sj =1 p j µ j (cid:17) in Example 4.2 We remark that, if we set ϕ ( x, p , . . . , p s ) := P sj =1 p j F µ j ( x ), we are interested in the function x ( p , . . . , p s ) :=  s X j =1 p j F µ j  − ( α ) , which is the implicit function deﬁned by the condition ϕ ( x, p , . . . , p s ) = α .We assume that x ( p , . . . , p s ) is not continuous at some point ( q , . . . , q s ). Then we can ﬁnd asequence { ( p ( n )1 , . . . , p ( n ) s ) : n ≥ } which converges to ( q , . . . , q s ), and { x ( p ( n )1 , . . . , p ( n ) s ) : n ≥ } converges to some limit ℓ , with ℓ = x ( q , . . . , q s ); moreover we have ϕ ( x ( p ( n )1 , . . . , p ( n ) s ) , p ( n )1 , . . . , p ( n ) s ) = α (for all n ≥ n → ∞ , we get P sj =1 q j F µ j ( ℓ ) = α (by the continuity of the functions F µ , . . . , F µ s ). The last equality yields ℓ = x ( q , . . . , q ss