Large deviations for risk measures in finite mixture models
aa r X i v : . [ q -f i n . R M ] F e b Large deviations for risk measures in finite mixture models ∗ Valeria Bignozzi † Claudio Macci ‡ Lea Petrella § Abstract
Due to their heterogeneity, insurance risks can be properly described as a mixture of differentfixed models, where the weights assigned to each model may be estimated empirically from asample of available data. If a risk measure is evaluated on the estimated mixture instead of the(unknown) true one, then it is important to investigate the committed error. In this paper westudy the asymptotic behaviour of estimated risk measures, as the data sample size tends toinfinity, in the fashion of large deviations. We obtain large deviation results by applying thecontraction principle, and the rate functions are given by a suitable variational formula; explicitexpressions are available for mixtures of two models. Finally, our results are applied to themost common risk measures, namely the quantiles, the Expected Shortfall and the shortfall riskmeasure.
AMS Subject Classification.
Primary: 60F10, 91B30. Secondary: 62B10, 62D05.
Keywords: contraction principle, Lagrange multipliers, quantile, entropic risk measure, relativeentropy.
Quantitative risk management for financial and insurance companies requires the modelling of fi-nancial positions in terms of random variables on a suitable probability space; in mathematicalterms, this corresponds to identifying a probability law (model) µ on the real line that describesas accurately as possible, the random behaviour of the position. Model risk, that arises from theuncertainty about the model to adopt, has been largely discussed in various area of the litera-ture, because it may impact substantially companies decision making and performance. We candistinguish three main approaches to deal with model uncertainty: 1) the model is not specifiedbut directly extrapolated from data via the empirical distribution; 2) a model is selected and itsparameters are estimated from data (e.g. using Maximum Likelihood Estimation); 3) a class ofcandidate models is considered (for instance models suggested by expert opinion) and then one oran average of them is applied. The latter approach is probably the most common one and includesfor instance: the worst-case approach proposed by Gilboa and Schmeidler (1989) in the theory ofutility maximization, where the chosen model is the one providing the most adverse outcome; theBayesian model averaging approach, developed by Raftery et al. (1997) where (posterior) weightsare calculated for each model considering both information arising from data and prior beliefs;the highest posterior approach, where the selected model is the one most favourable according tothe posterior weights. Cairns (2000) provided a general framework for dealing with model and ∗ The support of Gruppo Nazionale per l’Analisi Matematica, la Probabilit`a e le loro Applicazioni (GNAMPA) ofthe Istituto Nazionale di Alta Matematica (INdAM) is acknowledged. † Dipartimento di Statistica e Metodi Quantitativi, Universit`a di Milano Bicocca, Via Bicocca degli Arcimboldi 8,I-20126 Milano, Italia. e-mail: [email protected] ‡ Dipartimento di Matematica, Universit`a di Roma Tor Vergata, Via della Ricerca Scientifica, I-00133 Roma, Italia.e-mail: [email protected] § Dipartimento di Metodi e Modelli per l’Economia, il Territorio e la Finanza, Sapienza Universit`a di Roma, Viadel Castro Laurenziano 9, I-00161 Roma, Italia. e-mail: [email protected] ρ to the financial position. The impact of modeluncertainty on risk measurement was discussed among others by Barrieu and Scandolo (2015) andBignozzi and Tsanakas (2016) where different measures of model risk are considered.Most of the risk measures generally considered, from both academics and practitioners, are law-invariant that is, univocally determined by the probability law of the random variable. Theserisk measures can then be treated as statistical functionals.While the mathematical theory of risk measures is by now well developed, we refer for in-stance to F¨ollmer and Schied (2016) for an extensive treatment of coherent and convex risk mea-sures, research on the statistical properties of risk measures is fairly recent. The seminal paper byCont et al. (2010) started a new strand in the literature that investigates the statistical propertiesof risk measures in terms of robustness with respect to available data and to different model esti-mation procedures. The main difference between the mathematical and the statistical approachesis that, in the first case risk measures are defined on a space of random variables, while in thesecond one, on a space of probability measures. Although, under weak technical assumptions, fora random variable X with probability law µ , we can identify ρ ( X ) and ρ ( µ ), it is important toemphasise that properties of risk measures on random variables and on distributions are different.In particular, given two random variables X , Y with distributions µ , ν the convex combination pX + (1 − p ) Y, for p ∈ (0 , pµ + (1 − p ) ν represents a higher risk profile and thus a risk measure should not be convex with respect to mixturesof distributions. Properties of the risk measures with respect to mixture distributions, have beeninvestigated by Acciaio and Svindland (2013). Weber (2006) used such properties to characterisedynamic risk measures, while Ziegel (2016), Bellini and Bignozzi (2015) and Delbaen et al. (2016)used them to study elicitable functionals. Bernardi et al. (2017) presented some results on riskmeasures evaluated on mixtures of Gaussian and Student t distributions.In this contribution we consider risk measures applied to the mixture distribution π µ + . . . + π s µ s , where { µ , . . . , µ s } is a set of s available models, and π , . . . , π s ≥ P sj =1 π j = 1) are theweights assigned to each model. Mixture models are particularly relevant when a single model is notsufficient to fully describe the data. They represent a flexible approach for modelling heterogeneousdata and to carry out cluster analysis. Further, mixture models represent a ductile way to modelunknown distributional shapes. Such situations are quite common in insurance where often a mix ofsmall, medium and large size claims occurs; we refer the interested reader to Klugman et al. (2012)for a full treatment of loss modelling in actuarial science. Bernardi et al. (2012) proposed finitemixtures of Skew Normal distributions to properly characterise insurance data, while Lee and Lin(2010) suggested a mixture of Erlang distributions. In a statistical framework mixture models havea variety of applications; we refer to McLachlan and Peel (2004) for an extensive treatment of thetopic. 2hroughout this paper the models µ , . . . , µ s are assumed to be fixed, and the weights π , . . . , π s are estimated empirically from independent samples. In an insurance framework, we can assumethat each model represents the loss profile of a customer (or a class of customers) and the weights areestimated registering the relative frequency of claims occurring for each model. Then we considerthe sequence of empirical risk measures n ρ ( P sj =1 ˆ π n ( j ) µ j ) : n ≥ o , where the weight estimatorsˆ π n (1) , . . . , ˆ π n ( s ) concern the empirical law of i.i.d. random variables { X , . . . , X n } with distribution π = ( π , . . . , π s ) (see (1) below).In this paper we prove large deviation results for the empirical risk measures. The theory oflarge deviations gives an asymptotic computation of small probabilities on an exponential scale(see e.g. Dembo and Zeitouni, 1998 as a reference on this topic). The large deviation principlesare obtained by applying the contraction principle; so the rate functions are given by a suitablevariational formula. We use the method of the Lagrange multipliers, and explicit expressions areavailable for s = 2. We then apply our results to the most common risk measures, namely thequantiles (also known as Value-at-Risk in the risk management literature), the Expected Shortfall(ES) and the shortfall risk measure.A different approach for large deviation analysis may be the use of precise large deviationtechniques which are beyond the purpose of the paper; among others, a possible reference for theinterested reader is F´eray et al. (2016).Our work was inspired by Weber (2007), where the author considered the empirical risk measures { ρ (ˆ µ n ) : n ≥ } , and ˆ µ n := 1 n n X i =1 δ Y i is the empirical law of i.i.d. random variables { Y , . . . , Y n } having (unknown) distribution µ withbounded support. The main goal of that paper is to investigate coherent and convex risk mea-sures that are continuous on compacts. This condition yields the large deviation principle of { ρ (ˆ µ n ) : n ≥ } by applying the contraction principle (see Proposition 2.1 and Corollary 2.1 inWeber, 2007).The paper is organised as follows. Section 2 gathers some preliminaries on large deviations andtheir applications to our framework with finite mixtures. In Section 3 we present the main resultsof the paper, while Section 4 presents some examples for the most common risk measures used inpractice and in the literature. In this section we recall some preliminaries on large deviations and a large deviation principle fora sequence of estimators (see Proposition 2.1).
A sequence of random variables { W n : n ≥ } taking values on a topological space W satisfiesthe large deviation principle (LDP for short) with rate function I : W → [0 , ∞ ] if I is a lowersemi-continuous function,lim inf n →∞ n log P ( W n ∈ O ) ≥ − inf w ∈ O I ( w ) for all open sets O and lim sup n →∞ n log P ( W n ∈ C ) ≤ − inf w ∈ C I ( w ) for all closed sets C. I is said to be good if all its level sets {{ w ∈ W : I ( w ) ≤ η } : η ≥ } are compact.Finally we also recall the contraction principle (see e.g. Theorem 4.2.1 in Dembo and Zeitouni,1998): let Y be a topological space, and let f : W → Y be a continuous function; then, if { W n : n ≥ } satisfies the LDP with good rate function I , and Y n := f ( W n ) (for all n ≥ { Y n : n ≥ } satisfies the LDP with good rate function J defined by J ( y ) := inf { I ( w ) : w ∈ W , f ( w ) = y } . The LDP for real valued random variables is used to obtain asymptotic evaluations for thelogarithm of tail probabilities; indeed, for a wide class of cases, we havelog P ( W n > x ) ∼ − nI ( x )at least for x large enough to have I ( x ) = inf w>x I ( w ) (we use the symbol ∼ to mean that the ratiotends to 1 as n → ∞ ). ρ ( µ ) when µ is a mixture We define a law-invariant risk measure as a map ρ : P ( R ) → R , that assigns to every probability measure µ ∈ P ( R ) on the real line a real number ρ ( µ ). Such a value,is generally used to summarise the riskiness of the model µ and can be adopted to calculate solvencycapital requirements. In the present contribution, we focus on probability distributions that ariseas mixtures π µ + . . . + π s µ s of some fixed models µ , . . . , µ s with weights π = ( π , . . . , π s ) ∈ Σ s where Σ s := { ( p , . . . , p s ) : p , . . . , p s ≥ , p + · · · + p s = 1 } is the simplex; we are then interested in computing ρ (cid:16)P sj =1 π j µ j (cid:17) . In mixture models used formodeling insurance data, it is often the case that the weights π , . . . , π s are unknown and estimatedfrom a set of n available data by ˆ π n = (ˆ π n (1) , . . . , ˆ π n ( s )), see for instance Lee et al. (2012). In orderto estimate the error committed in computing the estimated risk measure ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) insteadof the correct one ρ (cid:16)P sj =1 π ( j ) µ j (cid:17) , we employ the theory of large deviations. In particular, weconsider the case where the weights are estimated empirically asˆ π n ( j ) := 1 n n X i =1 δ X i = j (for all j ∈ { , . . . , s } ) , (1)where { X , . . . , X n } are i.i.d. random variables with distribution π = ( π , . . . , π s ). It is well knownthat the sequence of empirical measures (ˆ π n ) n converges P -a.s. to π , and that it satisfies the LDP(see e.g. Theorem 2.1.10 in Dembo and Zeitouni, 1998). Therefore, by applying the contractionprinciple (see Theorem 4.2.1 in Dembo and Zeitouni, 1998), we obtain the LDP stated in thefollowing proposition. Proposition 2.1.
Let π = ( π , . . . , π s ) ∈ Σ s . Moreover assume that the function Σ s ∋ ( p , . . . , p s ) ρ s X j =1 p j µ j (2)4 s continuous. Then n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) : n ≥ o satisfies the LDP (as n → ∞ ) with good ratefunction H ρ, h π,µ i defined by H ρ, h π,µ i ( r ) := inf s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s , ρ s X j =1 p j µ j = r . (3) Remark 2.1 (Relative entropy and Sanov’s Theorem) . The quantity E π [ dpdπ log dpdπ ] = P sj =1 p j log p j π j in (3) is the relative entropy of a general probability measure p = ( p , . . . , p s ) on the state space { , . . . , s } , with respect to the probability measure π = ( π , . . . , π s ) that gives the actual (but un-known) weights of the mixture model. Large deviation rate functions are indeed often expressed interms of relative entropy; see e.g. the discussion in Varadhan (2003). The rate function is thusobtained minimising the relative entropy under a constraint on the risk measure. Remark 2.2 (The set S π and the value r ) . If π j = 0 for some j ∈ { , . . . , s } , then we have p j log p j π j = (cid:26) if p j = 0 ∞ if p j ∈ (0 , so, in some sense, the index j is negligible. Then we should consider the set S π := { i ∈ { , . . . , s } : π i > } instead of { , . . . , s } ; however, with a slight abuse of notation, throughout the paper wealways refer to { , . . . , s } (and its cardinality s ) because we can always rearrange the notation inorder to have S π = { , . . . , s } . We also remark that H ρ, h π,µ i ( r ) uniquely vanishes at r = r , where r := ρ s X j =1 π j µ j . Thus we can say that, for every δ > , under the hypotheses of Proposition 2.1, the probability P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ s X j =1 ˆ π n ( j ) µ j − r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ δ decays as e − nh δ , where h δ := inf (cid:8) H ρ, h π,µ i ( r ) : | r − r | ≥ δ (cid:9) > , as n → ∞ . In this section we provide, when possible, an explicit expression of the variational formula in (3).Note that in general the constraint ρ (cid:16)P sj =1 p j µ j (cid:17) = r cannot be written explicitly in terms of the p j ’s; for this reason we introduce the next Condition 3.1 that requires a sort of linear dependence ofthe risk measures with respect to the mixture weights. As we shall see in Theorem 3.1, this allowsus to handle the variational formula in (3) with the method of Lagrange multipliers. Condition3.1 does not seem to be restrictive, it is indeed satisfied by many of the risk measures used byacademics and practitioners. Condition 3.1.
The function in (3) can be written as H ρ, h π,µ i ( r ) := inf s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s , s X j =1 p j Ψ ρ ( µ j , r ) = 0 , (4) for some (strictly) decreasing functions Ψ ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · ) . Moreover, for all i ∈ { , . . . , s } ,there exists a unique r (0) i such that Ψ ρ (cid:16) µ i , r (0) i (cid:17) = 0 . ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · ) are increasing functions (instead of de-creasing); in such a case we can reduce to Condition 3.1 (namely the functions Ψ ρ ( µ , · ) , . . . , Ψ ρ ( µ s , · )are decreasing) by a change of sign.We will see in Section 4 that Condition 3.1 is fulfilled by some popular risk measures, such as thequantiles, the mean and the class of convex shortfall risk measures introduced by F¨ollmer and Schied(2002). The Expected Shortfall satisfies this condition only under some extra requirements. A sim-ilar condition recently appeared in the literature about elicitable risk measures under the nameof Convex Level Sets (CxLS). A risk measure has CxLS if, given ρ ( µ ) = · · · = ρ ( µ s ) = r , then ρ ( P sj =1 p j µ j ) = r . Clearly, if ρ ( µ ) = · · · = ρ ( µ s ) = r , the CxLS property implies our Condition3.1 with Ψ ρ ( µ j , r ) := ρ ( µ j ) − r. A full characterization of convex risk measures satisfying the CxLS property is provided in Delbaen et al.(2016).
Remark 3.1 (Consequences of Condition 3.1 for r ) . If Condition 3.1 holds, then we have s X j =1 π j Ψ ρ ( µ j , r ) = 0 . (5) Moreover, if we set r ρ := min n r (0) i : i ∈ { , . . . , s } o and r ρ := max n r (0) i : i ∈ { , . . . , s } o , we have r ρ ≤ r ρ . Then we can distinguish two cases (see parts (i) and (ii) in the next Theorem 3.1): • r ρ = r ρ =: ˆ r ρ , which occurs if and only if r (0)1 = · · · = r (0) s = ˆ r ρ ; in this case we have r = ˆ r ρ ,and the estimators n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) ; n ≥ o are constantly equal to r ; • r ρ < r ρ ; in this case we have r ∈ ( r ρ , r ρ ) , and the estimators n ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) ; n ≥ o take values in [ r ρ , r ρ ] .The first case always occurs if s = 1 . Now we are ready to present Theorem 3.1. In general we only have an explicit expression of H ρ, h π,µ i for the case s = 2; see Remark 3.2 and Remark 3.3. The case with s = ∞ will be discussedin Remark 3.4. Theorem 3.1.
Consider the same hypotheses of Proposition 2.1. Assume that Condition 3.1 holds,and let r ρ and r ρ be as in Remark 3.1.(i) If r ρ = r ρ , then H ρ, h π,µ i ( r ) = ( if r = r (0)1 = · · · = r (0) s ∞ otherwise . (ii) If r ρ < r ρ , then H ρ, h π,µ i ( r ) = − log (cid:16)P sj =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) (cid:17) if r ∈ ( r ρ , r ρ ) − log P j : r (0) j = r ρ π j if r = r ρ − log P j : r (0) j = r ρ π j if r = r ρ ∞ if r / ∈ [ r ρ , r ρ ] , where λ ∗ ( r ) is such that P sj =1 π j Ψ ρ ( µ j , r ) e − λ ∗ ( r )Ψ ρ ( µ j ,r ) P sj =1 π j e − λ ∗ ( r )Ψ ρ ( µ j ,r ) = 0 . (6)6 roof. We start with the proof of the statement (i). For r = r (0)1 = · · · = r (0) s we have H ρ, h π,µ i ( r ) = inf s X j =1 p j log p j π j : ( p , . . . , p s ) ∈ Σ s = 0(the infimum is attained by choosing ( p , . . . , p s ) = ( π , . . . , π s )); on the contrary, for r = r (0)1 = · · · = r (0) s , we have H ρ, h π,µ i ( r ) = ∞ because the condition P sj =1 p j Ψ ρ ( µ j , r ) = 0 fails for everychoice of ( p , . . . , p s ) ∈ Σ s (in fact the values { Ψ ρ ( µ j , r ) : j ∈ { , . . . , s }} are all positive if r
For r ∈ ( r ρ , r ρ ) we have H ′ ρ, h π,µ i ( r ) = P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) [ λ ′∗ ( r )Ψ ρ ( µ h , r ) + λ ∗ ( r )Ψ ′ ρ ( µ h , r )] P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = λ ′∗ ( r ) P sh =1 π h Ψ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) | {z } =0 by (6) + λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ;thus, since λ ∗ ( r ) = 0, we get H ′ ρ, h π,µ i ( r ) = λ ∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) = 0 . Moreover, again for r ∈ ( r ρ , r ρ ), we have H ′′ ρ, h π,µ i ( r ) = λ ′∗ ( r ) P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) + λ ∗ ( r ) ddr P sh =1 π h Ψ ′ ρ ( µ h , r ) e − λ ∗ ( r )Ψ ρ ( µ h ,r ) P sh =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! ;thus, by taking into account again λ ∗ ( r ) = 0, we obtain H ′′ ρ, h π,µ i ( r ) = λ ′∗ ( r ) s X h =1 π h Ψ ′ ρ ( µ h , r ) . (10)We conclude by computing λ ′∗ ( r ) by means of the implicit function theorem. By (6) we considerthe function ∆( r, λ ) := P sj =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ;the partial derivatives of ∆ are∆ r ( r, λ ) = 1( P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ) · s X j =1 π j Ψ ′ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) (1 − λ Ψ ρ ( µ j , r )) s X j =1 π j e − λ Ψ ρ ( µ j ,r ) + λ s X j =1 π j Ψ ′ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) λ ( r, λ ) = 1( P sj =1 π j e − λ Ψ ρ ( µ j ,r ) ) · − s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) s X j =1 π j e − λ Ψ ρ ( µ j ,r ) + s X j =1 π j Ψ ρ ( µ j , r ) e − λ Ψ ρ ( µ j ,r ) ;thus, by taking into account (5) and λ ∗ ( r ) = 0, we have∆ r ( r , λ ∗ ( r )) = s X j =1 π j Ψ ′ ρ ( µ j , r ) and ∆ λ ( r , λ ∗ ( r )) = − s X j =1 π j Ψ ρ ( µ j , r ) , and the implicit function theorem yields λ ′∗ ( r ) = − ∆ r ( r, λ )∆ λ ( r, λ ) (cid:12)(cid:12)(cid:12)(cid:12) ( r,λ )=( r ,λ ∗ ( r )) = P sj =1 π j Ψ ′ ρ ( µ j , r ) P sj =1 π j Ψ ρ ( µ j , r ) . We conclude the proof by combining this equality and (10).
In this section we consider some examples of risk measures satisfying Condition 3.1. The firstexample concerns risk measures that are linearly dependent with respect to the weights, and wepresent two specific cases. Other examples consider quantiles and the class of shortfall risk measures,that includes the entropic risk measures as a special case. We conclude the section with twoexamples: one for an insurance application with s = 2, and one where we obtain explicit expressionsfor s = 3. In view of what follows we consider the notation F µ for the distribution functionassociated with the law µ , namely F µ ( x ) := µ (( −∞ , x ]) , for all x ∈ R . We remark that, when we deal with a finite mixture P sj =1 p j µ j of some laws µ , . . . , µ s (for some( p , . . . , p s ) ∈ Σ s ), we have F P sj =1 p j µ j = P sj =1 p j F µ j . Example 4.1 (Linear dependence with respect to the weights) . We assume that the function (2) satisfies the following condition: ρ s X j =1 p j µ j = s X j =1 p j ρ ( µ j ) for all ( p , . . . , p s ) ∈ Σ s . Obviously we have a continuous function. In this case one has Ψ ρ ( µ i , r ) = ρ ( µ i ) − r which yields r (0) i = ρ ( µ i ) and r = P sj =1 π j ρ ( µ j ) ; moreover r ρ := min { ρ ( µ i ) : i ∈ { , . . . , s }} and r ρ := max { ρ ( µ i ) : i ∈ { , . . . , s }} . ow we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if ρ ( µ ) = ρ ( µ ) . By (8) , for r ∈ ( r ρ , r ρ ) , weget H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = − rλ ∗ ( r ) − log s X h =1 π h e − λ ∗ ( r ) ρ ( µ h ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log X h =1 π h (cid:18) − π ( ρ ( µ ) − r ) π ( ρ ( µ ) − r ) (cid:19) ρ ( µh ) − rρ ( µ − ρ ( µ ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 . Finally, by Proposition3.2 (and after some computations where we take into account that r = P sj =1 π j ρ ( µ j ) ), we get H ′′ ρ, h π,µ i ( r ) = 1 P sh =1 π h ( ρ ( µ h ) − r ) = 1 P sh =1 π h ρ ( µ h ) − r . Here we briefly present two particular cases concerning Example 4.1. • The expected value (when µ , . . . , µ s are probability measures of integrable random variables);in fact we have Z R x s X j =1 p j µ j ( dx ) = s X j =1 p j Z R xµ j ( dx ) . • The Expected Shortfall ES α , for α ∈ (0 , µ , . . . , µ s have the same α -quantile, namelywhen F − µ ( α ) = · · · = F − µ s ( α ) =: r α . We recall thatES α ( µ ) := 11 − α Z ∞ F − µ ( α ) xµ ( dx ) , and that we have (cid:16)P sj =1 p j F µ j (cid:17) − ( α ) = r α . ThereforeES α s X j =1 p j µ j = 11 − α Z ∞ ( P sj =1 p j F µj ) − ( α ) x s X j =1 p j µ j ( dx )= s X j =1 p j − α Z ∞ r α xµ j ( dx ) = s X j =1 p j ES α ( µ j ) . We conclude with the final examples.
Example 4.2 (Quantiles) . Let us consider α ∈ (0 , and strictly increasing and continuous dis-tribution functions F µ , . . . , F µ s on the same interval. We assume that the function (2) is definedby Σ s ∋ ( p , . . . , p s ) ρ s X j =1 p j µ j := s X j =1 p j F µ j − ( α ) . This function is continuous (see Appendix for details). In this case one has Ψ ρ ( µ i , r ) = α − F µ i ( r )11 hich yields r (0) i = F − µ i ( α ) and r = (cid:16)P sj =1 π j F µ j (cid:17) − ( α ) ; moreover r ρ := min (cid:8) F − µ i ( α ) : i ∈ { , . . . , s } (cid:9) and r ρ := max (cid:8) F − µ i ( α ) : i ∈ { , . . . , s } (cid:9) . Now we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if F − µ ( α ) = F − µ ( α ) . By (8) , for r ∈ ( r ρ , r ρ ) ,we get H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = αλ ∗ ( r ) − log s X h =1 π h e λ ∗ ( r ) F µh ( r ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log X h =1 π h (cid:18) − π ( α − F µ ( r )) π ( α − F µ ( r )) (cid:19) α − Fµh ( r ) Fµ r ) − Fµ r ) ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 because P sj =1 π j F µ j ( r ) = α . Finally, by Proposition 3.2 (and after some computations where we take into account again that P sj =1 π j F µ j ( r ) = α ), we get H ′′ ρ, h π,µ i ( r ) = (cid:0)P sh =1 π h F ′ µ i ( r ) (cid:1) P sh =1 π h ( α − F µ h ( r )) = (cid:0)P sh =1 π h F ′ µ i ( r ) (cid:1) P sh =1 π h F µ h ( r ) − α . Example 4.3 (Shortfall risk measures) . We recall some preliminaries (see F¨ollmer and Schied(2002)). Given a loss function ℓ : R → R (that is a convex, increasing and not identically constantfunction) and an interior point x in the range of ℓ , a shortfall risk measure is defined by ρ ( µ ) := inf (cid:26) m ∈ R : Z R ℓ ( x − m ) µ ( dx ) ≤ x (cid:27) ; moreover it is the unique solution m to the equation Z R ℓ ( x − m ) µ ( dx ) = x . We assume that the function (2) is defined by Σ s ∋ ( p , . . . , p s ) ρ s X j =1 p j µ j := r, where s X j =1 p j Z R ℓ ( x − r ) µ j ( dx ) = x . The continuity of this function can be checked by adapting the proof in the Appendix. In this caseone has Ψ ρ ( µ i , r ) = Z R ℓ ( x − r ) µ i ( dx ) − x . From now on, in order to have explicit results, we continue our analysis for the class of entropicrisk measures, that is the case where we have the loss function ℓ ( x ) = e θx , for θ > , and x = 1 .We can check the following equalities: ρ ( µ ) = 1 θ log (cid:18)Z R e θx µ ( dx ) (cid:19) ;12 he function (2) becomes Σ s ∋ ( p , . . . , p s ) ρ s X j =1 p j µ j := 1 θ log Z R e θx s X j =1 p j µ j ( dx ); (so we have Σ s ∋ ( p , . . . , p s ) θ log (cid:16)P sj =1 p j e θρ ( µ j ) (cid:17) , which is a continuous function); thefunction Ψ ρ ( µ i , r ) can be rewritten as Ψ ρ ( µ i , r ) = e θρ ( µ i ) − e θr , which yields r (0) i = ρ ( µ i ) and r = θ log (cid:16)P sj =1 π j e θρ ( µ j ) (cid:17) ; moreover r ρ := min { ρ ( µ i ) : i ∈ { , . . . , s }} and r ρ := max { ρ ( µ i ) : i ∈ { , . . . , s }} . Now we present some formulas for the rate function H ρ, h π,µ i when r ρ < r ρ ; in view of this weremark that, for s = 2 , we have r ρ < r ρ if and only if ρ ( µ ) = ρ ( µ ) . By (8) , for r ∈ ( r ρ , r ρ ) , weget H ρ, h π,µ i ( r ) = − log s X h =1 π h e − λ ∗ ( r )Ψ ρ ( µ h ,r ) ! = − e θr λ ∗ ( r ) − log s X h =1 π h e − λ ∗ ( r ) e θρ ( µi ) ! . Moreover, by Remark 3.3 concerning the case s = 2 , for r ∈ ( r ρ , r ρ ) we get H ρ, h π,µ i ( r ) = − log X h =1 π h − π ( e θρ ( µ ) − e θr ) π ( e θρ ( µ ) − e θr ) ! eθρ ( µh ) − eθreθρ ( µ − eθρ ( µ ; in particular we can easily check that this formula yields H ρ, h π,µ i ( r ) = 0 because e θr = P sj =1 π j e θρ ( µ j ) .Finally, by Proposition 3.2 (and after some computations where we take into account again that e θr = P sj =1 π j e θρ ( µ j ) ), we get H ′′ ρ, h π,µ i ( r ) = ( θe θr ) P sh =1 π h ( e θρ ( µ h ) − e θr ) = ( θe θr ) P sh =1 π h e θρ ( µ h ) − e θr . Example 4.4 (An insurance example with s = 2) . In actuarial science mixture distributions areparticularly relevant for modeling different claim sizes. Consider, for instance, a car insurancecontext where individuals are grouped into s categories depending on their accident history; weassume for convenience that s = 2 . Each group claim distribution µ j may be modeled using anexponential distribution with parameter λ j , j ∈ { , } ; thus F µ j ( x ) = 1 − e − λ j ( x ) , with x > andthe probability of arrival of a claim in group j is π j > , with π + π = 1 . We assume that the λ j ’s are given and without loss of generality λ < λ (for λ = λ we find the usual exponentialdistribution). Instead the π j ’s are estimated empirically from a sample of n claims; denoting X i arandom variable taking values or depending on whether claim i belongs to the group j , we obtain X i = j with probability π j and ˆ π n ( j ) = n P ni =1 δ X i = j for all j ∈ { , } . In this example, we areinterested in understanding what happens to ρ (cid:16)P sj =1 ˆ π n ( j ) µ j (cid:17) as n → ∞ and the risk measure ρ isquantile at level α ∈ (0 , , also known as Value-at-Risk ( VaR α ) in the risk management literature.We recall that for a model µ with continuous and strictly increasing distribution function F , wehave ρ ( µ ) = F − ( α ) , therefore we have ρ ( µ j ) = − λ j log(1 − α ) , j ∈ { , } . rom Example 4.2, we know that Ψ ρ ( µ j , r ) = α − F µ j ( r ) = α − e − λ j r , and r ρ = r (0)1 = − λ log(1 − α ) > − λ log(1 − α ) = r (0)2 = r ρ . From Remark 3.3, for r ∈ ( r ρ , r ρ ) we easily find λ ∗ ( r ) = − e − λ r − e − λ r log (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) and the weights p = ( p , p ) in (7) which attain the infimum in (4) are given by p j ( r ) = π j e − λ ∗ ( r )( α − e − λj r ) P j =1 π j e − λ ∗ ( r )( α − e − λj r ) , j ∈ { , } . We then obtain the rate function: H ρ,<π,µ> ( r ) = − log π (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) α − e − λ re − λ r − e − λ r + π (cid:18) − π ( α − e − λ r ) π ( α − e − λ r ) (cid:19) α − e − λ re − λ r − e − λ r . We refer the interested reader to Lee et al. (2012) for a more detailed analysis of the use of expo-nential mixture models in insurance.
Example 4.5 (A specific example with s = 3) . We refer to Condition 3.1 with s = 3 and, forsome a > (and a suitable strictly decreasing function Ψ ), we set Ψ ρ ( µ i , r ) := Ψ( r ) + ( i − a ( for all i ∈ { , , } ); (we remark that we are in this situation when we deal with Example 4.1, with s = 3 ; in such a casewe have Ψ( r ) = ρ ( µ ) − r , ρ ( µ ) = ρ ( µ ) + a , and ρ ( µ ) = ρ ( µ ) + 2 a ).We recall that r ρ ≤ r (0)1 < r (0)2 < r (0)3 ≤ r ρ , where r (0) i = Ψ − (( i − a ) (for all i ∈ { , , } ). Moreover, after some computations, (5) with s = 3 yields r = Ψ − ( − a ( π + 2 π )) . Now we compute the rate function H ρ, h π,µ i in Theorem 3.1(ii). We have H ρ, h π,µ i (Ψ − (0)) = − log π and H ρ, h π,µ i (Ψ − ( − a )) = − log π , which concern the cases r = r ρ and r = r ρ . In what follows we take r ∈ ( r ρ , r ρ ) = (Ψ − (0) , Ψ − ( − a )) .Firstly (6) yields e − λ ∗ ( r )Ψ( r ) (cid:16) π Ψ( r ) + π (Ψ( r ) + a ) e − λ ∗ ( r ) a + π (Ψ( r ) + 2 a ) e − λ ∗ ( r ) a (cid:17) = 0 , and therefore e − λ ∗ ( r ) a = − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) ;14 e remark that π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a ) ≥ because Ψ( r ) < and Ψ( r ) + 2 a > ,and q π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a ) ≥ | π (Ψ( r ) + a ) | . Thus we easily obtain λ ∗ ( r ) and the rate function expression is H ρ, h π,µ i ( r ) = − log X j =1 π j − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) ! Ψ ρ ( µ j ,r ) /a . A final remark concerns the condition λ ∗ ( r ) = 0 (see Remark 3.2). The above formulas yield − π (Ψ( r ) + a ) + p π (Ψ( r ) + a ) − π π Ψ( r )(Ψ( r ) + 2 a )2 π (Ψ( r ) + 2 a ) = 1 and, after some computations, we get π Ψ( r ) + π (Ψ( r ) + a ) + π (Ψ( r ) + 2 a ) = 0 which recovers (5) (with s = 3 ). Acknowledgements.
The authors are grateful to anonymous reviewers for their careful reportswhich highly improved the paper.
References
Acciaio, B. and G. Svindland (2013). Are law-invariant risk functions concave on distributions?
Dependence Modeling 1 , 54–64.Barrieu, P. and G. Scandolo (2015). Assessing financial model risk.
European Journal of OperationalResearch 242 (2), 546–556.Bellini, F. and V. Bignozzi (2015). On elicitable risk measures.
Quantitative Finance 15 (5), 725–733.Bernardi, M., A. Maruotti, and L. Petrella (2012). Skew mixture models for loss distributions: aBayesian approach.
Insurance: Mathematics and Economics 51 (3), 617–623.Bernardi, M., A. Maruotti, and L. Petrella (2017). Multiple risk measures for multivariate dynamicheavy-tailed models.
Journal of Empirical Finance 43 , 1–32.Bignozzi, V. and A. Tsanakas (2016). Model uncertainty in risk capital measurement.
The Journalof Risk 18 (3), 1–24.Cairns, A. J. (2000). A discussion of parameter and model uncertainty in insurance.
Insurance:Mathematics and Economics 27 (3), 313–330.Cont, R., R. Deguest, and G. Scandolo (2010). Robustness and sensitivity analysis of risk mea-surement procedures.
Quantitative Finance 10(6) , 593–606.Delbaen, F., F. Bellini, V. Bignozzi, and J. F. Ziegel (2016). Risk measures with the CxLS property.
Finance and Stochastics 20 (2), 433–453.Dembo, A. and O. Zeitouni (1998).
Large Deviations Techniques and Applications (2nd ed.).Springer. 15´eray, V., P.-L. M´eliot, and A. Nikeghbali (2016).
Mod- φ Convergence. Normality Zones andPrecise Deviations . Springer.F¨ollmer, H. and A. Schied (2002) Convex measures of risk and trading constraints.
Finance andstochastic 6 (4), 429–447.F¨ollmer, H. and A. Schied (2016).
Stochastic Finance: An Introduction in Discrete Time . WalterDe Gruyter.Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with non-unique prior.
Journal ofMathematical Economics 18 (2), 141–153.Klugman, S. A., H. H. Panjer, and G. E. Willmot (2012).
Loss Models: From Data to Decisions .John Wiley & Sons.Lee, D., Li, W. K. and T.S.T. Wong (2012). Modeling insurance claims via a mixture exponen-tial model combined with peaks-over-threshold approach.
Insurance: Mathematics and Eco-nomics 51 (3), 538–550.Lee, S. C. and X. S. Lin (2010). Modeling and evaluating insurance losses via mixtures of Erlangdistributions.
North American Actuarial Journal 14 (1), 107–130.McLachlan, G. and D. Peel (2004).
Finite Mixture Models . John Wiley & Sons.Pesaran, M. H., C. Schleicher, and P. Zaffaroni (2009). Model averaging in risk management withan application to futures markets.
Journal of Empirical Finance 16 (2), 280–305.Raftery, A. E., D. Madigan, and J. A. Hoeting (1997). Bayesian model averaging for linear regressionmodels.
Journal of the American Statistical Association 92 (437), 179–191.Varadhan, S. R. S. (2003). Large deviations and entropy. In
Entropy , Princeton Ser. Appl. Math.,pp. 199–214. Princeton University Press.Weber, S. (2006). Distribution-invariant risk measures, information, and dynamic consistency.
Mathematical Finance 16 (2), 419–441.Weber, S. (2007). Distribution-invariant risk measures, entropy, and large deviations.
Journal ofApplied Probability 44 (1), 16–40.Ziegel, J. F. (2016). Coherence and elicitability.
Mathematical Finance 26 (4), 901–918.
Appendix: The continuity of ρ (cid:16)P sj =1 p j µ j (cid:17) in Example 4.2 We remark that, if we set ϕ ( x, p , . . . , p s ) := P sj =1 p j F µ j ( x ), we are interested in the function x ( p , . . . , p s ) := s X j =1 p j F µ j − ( α ) , which is the implicit function defined by the condition ϕ ( x, p , . . . , p s ) = α .We assume that x ( p , . . . , p s ) is not continuous at some point ( q , . . . , q s ). Then we can find asequence { ( p ( n )1 , . . . , p ( n ) s ) : n ≥ } which converges to ( q , . . . , q s ), and { x ( p ( n )1 , . . . , p ( n ) s ) : n ≥ } converges to some limit ℓ , with ℓ = x ( q , . . . , q s ); moreover we have ϕ ( x ( p ( n )1 , . . . , p ( n ) s ) , p ( n )1 , . . . , p ( n ) s ) = α (for all n ≥ n → ∞ , we get P sj =1 q j F µ j ( ℓ ) = α (by the continuity of the functions F µ , . . . , F µ s ). The last equality yields ℓ = x ( q , . . . , q ss