aa r X i v : . [ q -f i n . R M ] A ug Variance Contracts
Yichun Chi ∗ Xun Yu Zhou † Sheng Chao Zhuang ‡ Abstract
We study the design of an optimal insurance contract in which the insured maximizesher expected utility and the insurer limits the variance of his risk exposure while main-taining the principle of indemnity and charging the premium according to the expectedvalue principle. We derive the optimal policy semi-analytically, which is coinsuranceabove a deductible when the variance bound is binding. This policy automatically satis-fies the incentive-compatible condition, which is crucial to rule out ex post moral hazard.We also find that the deductible is absent if and only if the contract pricing is actu-arially fair. Focusing on the actuarially fair case, we carry out comparative statics onthe effects of the insured’s initial wealth and the variance bound on insurance demand.Our results indicate that the expected coverage is always larger for a wealthier insured,implying that the underlying insurance is a normal good, which supports certain recentempirical findings. Moreover, as the variance constraint tightens, the insured who isprudent cedes less losses, while the insurer is exposed to less tail risk.
Key-words : Insurance design; expected value principle; variance; incentive compatibility; com-parative statics. ∗ China Institute for Actuarial Science, Central University of Finance and Economics, Beijing 102206, China.Email: [email protected] . † Department of Industrial Engineering and Operations Research and The Data Science Institute, Columbia Uni-versity, New York, NY 10027, USA. Email: [email protected] . ‡ Department of Finance, University of Nebraska-Lincoln, NE, USA. Email: [email protected] . Introduction
Insurance is an efficient mechanism to facilitate risk reallocation between two parties. Borch (1960)was the first to study the insurance contract design problem and to prove that given a fixedpremium, a stop-loss (or deductible) insurance policy (i.e., full coverage above a deductible) achievesthe smallest variance of the insured’s share of payment. Arrow (1963) assumes that the premiumis calculated by the expected value principle (i.e., the insurance cost is proportional to the expectedindemnity) and imposes the principle of indemnity (i.e., the insurer’s reimbursement is non-negativeand smaller than the loss). Under these specifics, Arrow (1963) shows that the stop-loss insuranceis Pareto optimal between a risk-neutral insurer and a risk-averse insured. This is a foundationalresult that has earned the name of
Arrow’s theorem of the deductible in the literature. Mossin(1968) further proves that the deductible is strictly positive if and only if the insurance price isactuarially unfair (i.e., the safety loading is strictly positive).In Arrow (1963), the insurer is assumed to be risk-neutral. This is based on the assumptionthat the insurer has a sufficiently large number of independent and homogenous insureds, suchthat his risk, by the law of large numbers, is sufficiently diversified to be nearly zero. This kindof theoretically ideal situation hardly occurs in practice, even if the insurer does indeed have ahuge number of clients. Moreover, it does not apply to tailor-made contracts for insuring one-offevents (e.g., the shipment of a highly valuable painting). Theorem 2 in Arrow (1971) stipulatesthat, when the insurer is risk-averse and insurance cost is absent, an optimal contract must involvecoinsurance. Raviv (1979) extends this result to include nonlinear insurance costs and shows thatan optimal policy involves both a deductible and coinsurance. Much of the recent research in thisarea has focused on the insurer’s tail risk exposure. Cummins and Mahul (2004) and Zhou et al.(2010) extend Arrow’s model by introducing an exogenous upper bound on indemnity and therebylimiting the insurer’s liability with respect to catastrophic losses. From a regulatory perspective,Zhou and Wu (2008) propose a model in which the insurer’s expected loss above a prescribed levelis controlled, and they conclude that an optimal policy is generally piecewise linear. Doherty et al.(2015) investigate the case in which losses are nonverifiable and deduce that a contract with adeductible and an endogenous upper limit is optimal.As important as tail risks are for both parties in insurance contracts, in practice insurers are also Borch (1960) presents the problem in a reinsurance setting, in which the ceding insurer corresponds to the insuredin an insurance setting. “Different stakeholders have different levels of interest in different parts of the distribution- the perspective of the decision-maker is important. Regulators and rating agencies will befocused on the extreme downside where the very existence of the company is in doubt. On theother hand, management and investors will have a greater interest in more near-term scenariostowards the middle of the distribution and will focus on the likelihood of making a profit aswell as a loss” (p. 4).
Assuming the insurer to be risk-averse with a concave utility function indeed takes the wholerisk distribution into consideration, as studied in Arrow (1971) and Raviv (1979). However, thereare notable drawbacks to the utility function approach. The notion of utility is opaque for manynon-specialists, and the benefit–risk tradeoff is only implied, implicitly , through a utility function.Moreover, one can rarely obtain, analytically, optimal policies with a general utility, hindering post-optimality analyses such as comparative statics. For instance, Raviv (1979) derives a differentialequation satisfied by the optimal indemnity, which takes a rather complex form depending on theutility function used.By contrast, variance, as a measure of risk originally put forth in Markowitz’s pioneering work(Markowitz, 1952), is also related to the whole distribution, yet it is more intuitive and transpar-ent. Borch (1960) designs a contract that aims to minimize the variance of the insured’s liability.Kaluszka (2001) extends Borch (1960)’s work by incorporating a variance-related premium princi-ple and shows that the optimal contract to minimize the variance of the insured’s payment can bestop-loss, quota share (i.e., the insurer covers a constant proportion of the loss) or a combination ofthe two. Vajda (1962) studies the problem from the insurer’s perspective, and shows that a quotashare policy minimizes the variance of indemnity in an actuarially fair contract. However, his resultdepends critically on limiting the admissible contracts to be such that the ratio between indemnityand loss increases as the loss increases, a feature that enables the derivation of a solution throughrather simple calculus. Vajda (1962) claims that this feature “agrees with the spirit of (re)insurance, at least in most cases” (p. 259).However, for a larger loss, it is indeed in the spirit of insurance that the insurer should pay more, but it is not clearwhy he should be responsible for a higher proportion . Interestingly, our results will show that the optimal policies ofour model possess this property if the insured is prudent; see Corollary 3.9.
3n this paper, we revisit the work of Arrow (1963) by imposing a variance constraint on the insurer ’s risk exposure. Unlike Vajda (1962), we consider the general actuarially unfair case andremove the restriction that the proportion of the insurer’s payment increases with the size of theloss. The presence of the variance constraint causes substantial technical challenges in solving theproblem. In the literature, there are generally two approaches used to study variants of Arrow’smodel: those involving sample-wise optimization and stochastic orders. However, the former failsto work for our problem due to the nonlinearity of the variance constraint, and the latter is notreadily applicable either because the presence of the variance constraint invalidates the claim thatany admissible contract is dominated by a stop-loss one. The first contribution of this paperis methodological: we develop a new approach by combining the techniques of stochastic orders,calculus of variations and the Lagrangian duality to derive optimal insurance policies. The solutionsare semi-analytical in the sense that they can be computed by solving some algebraic equations (asopposed to differential equations in Raviv 1979).Because the expected value premium principle ensures the expected profit of the insurer, ourmodel is essentially a mean–variance model `a la
Markowitz for the insurer . Our second contributionis actuarial: we show that the optimal contract is coinsurance above a deductible when the varianceconstraint is binding. Moreover, the deductible disappears if and only if the insurance price isactuarially fair, consistent with Mossin’s Theorem (Mossin, 1968). These results are qualitatively similar to those of Raviv (1979), who uses a concave utility function for the insurer. A naturalquestion is why one would bother to study the mean–variance version of a problem that wouldgenerate contracts with similar characteristics to its expected utility counterpart. This questioncan be answered in the same way as in the field of financial portfolio selection, where there is anenormously large body of study on the Markowitz mean–variance model along with its popularityin practice, despite the existence of the equally well-studied expected utility maximization models.In other words, expected utility and mean–variance are two different frameworks, and, as arguedearlier, the latter underlines a more transparent and explicit return–risk tradeoff, which usuallyleads to explicit solutions.Our optimal policies involve coinsurance, which is widely utilized in the insurance industry.As pointed out by Raviv (1979), risk aversion on the part of the insurer could be a cause forcoinsurance, but other attributes such as the nonlinearity of the insurance cost function couldalso lead to coinsurance. Another explanation for coinsurance is to mitigate the moral hazardrisk; see H¨olmstrom (1979) and Dionne and St-Michel (1991). From the insured’s perspective,4oherty and Schlesinger (1990) argue that default risk of the insurer can motivate the insured tochoose coinsurance. Picard (2000) also shows that coinsurance is optimal in order to reduce therisk premium paid to the auditor. In this paper, we prove that optimal policies can turn fromfull insurance to coinsurance as the variance bound tightens, thereby providing a novel yet simplereason for the prevalent feature of coinsurance in insurance theory and practice: a variance boundon the insurer’s risk exposure.Intriguingly, our optimal insurance polices automatically satisfy the so-called incentive-compatible condition that both the insured and the insurer pay more for a larger loss (or, equivalently, themarginal indemnity is between 0 and 1). In Arrow (1963)’s setting, the optimal contract – thestop-loss one – turns out to be incentive-compatible; however, this is generally untrue. Gollier(1996) considers an insured facing an additional background risk that is not insurable. Under theexpected value principle, he discovers that the optimal insurance, which relies heavily on the de-pendence between the background risk and the loss, may render the marginal indemnity strictlylarger than 1. Bernard et al. (2015) generalize the insured’s risk preference from expected utility torank-dependent utility involving probability distortion (weighting), and also find that the optimalindemnity may decrease when the loss increases. In both of these papers, the derived optimal con-tracts would incentivize the insured to misreport the actual losses, leading to ex post moral hazard.Equally absurd would be the case in which the insurer pays less for a larger loss. To address thisissue, Huberman et al. (1983) propose the incentive-compatible condition as a hard constraint onadmissible insurance policies, in addition to the principle of indemnity. Xu et al. (2019) add thisconstraint to the model of Bernard et al. (2015), painstakingly developing a completely differentapproach in order to overcome the difficulty arising out of this additional constraint and derivingqualitatively very different contracts. On the other hand, Raviv (1979) discovers that his optimalsolution is incentive-compatible, assuming that the loss has a strictly positive probability densityfunction. Carlier and Dana (2005) use a Hardy–Littlewood rearrangement argument to prove thatany optimal contract is dominated by an incentive-compatible contract, establishing the optimalityof the latter. However, their approach relies heavily on the assumption that the loss is non-atomic.Both of these studies rule out the important and practical case in which the loss is atomic at 0.By contrast, in the presence of the variance constraint, we show that the optimal policy is natu-rally incentive-compatible even without the corresponding hard constraint or the assumption of anatom-less loss. The incentive-compatible condition is termed the no-sabotage condition in Carlier and Dana (2003). larger for a wealthier insured whohas strictly decreasing absolute prudence (DAP), rendering the insurance product a normal good .This finding provides some theoretical foundation for the empirical observations of Millo (2016) andArmantier et al. (2018). Moreover, we show that the insurer has less downside risk when contract-ing with a wealthier insured with strictly DAP. This result reconciles with the well-documentedphenomenon that more economically advanced regions or countries have higher insurance densitiesand penetrations. On the other hand, we establish that the variance bound significantly changes aprudent insured’s risk transfer decision – she would consistently transfer more losses as the variancebound loosens. A corollary of this result is, rather surprisingly, that the insurer can reduce the tailrisk by simply tightening the variance constraint. This suggests that our variance contracts do, after all , address the issue of tail exposure.The rest of the paper proceeds as follows. In Section 2 we formulate the problem and presentsome preliminaries about risk preferences. In Section 3 we develop the solution approach and presentthe optimal insurance contracts. In Section 4 we conduct a comparative analysis by examining theeffects of the insured’s initial wealth and the variance constraint on insurance demand. Section 5concludes the paper. Some auxiliary results and all proofs are relegated to the appendices.
An insured endowed with an initial wealth w faces an insurable loss X , which is a non-negative,essentially bounded random variable defined on a probability space (Ω , F , P ) with the cumulativedistribution function (c.d.f.) F X ( x ) := P { X x } and the essential supremum M < ∞ . Aninsurance contract design problem is to partition X into two parts, I ( X ) and X − I ( X ), where I ( X )(the indemnity ) is the portion of the loss that is ceded to the insurer (“he”) and R I ( X ) := X − I ( X )(the retention ) is the portion borne by the insured (“she”). I and R I are also called the insured’s Menezes et al. (1980) introduce the notion of downside risk to compare two risks with the same mean and variance.The formal definition is given in Section 2.2. principle of indemnity , namely the indemnity is non-negative and less than the amount of loss.Thus, the feasible set of indemnity functions is C := { I : 0 I ( x ) x, ∀ x ∈ [0 , M ] } . As the insurer covers part of the loss for the insured, he is compensated by collecting thepremium from her. Following many studies in the literature, we assume that the insurer calculatesthe premium using the expected value principle . Specifically, the premium on making a non-negativerandom payment Y is charged as π ( Y ) = (1 + ρ ) E [ Y ]where ρ > safety loading coefficient. His risk exposure under a contract I for aloss X is hence e I ( X ) = I ( X ) − π ( I ( X )) . The insurer may evaluate this risk using different measures for different purposes, as Kaye (2005)notes. In this paper, we assume that the insurer has sufficient regulatory capital and thereforefocuses on the volatility of the underwriting risk. Specifically, he uses the variance to measure therisk and requires var [ e I ( X )] ≡ var [ I ( X )] ν for some prescribed ν > W I ( X ) the insured’s final wealth under contract I upon itsexpiration, namely W I ( X ) = w − X + I ( X ) − π ( I ( X )) . The insured’s risk preference is characterized by a von Neumann–Morgenstern utility function U satisfying U ′ > U ′′ < I ∈ C E [ U ( W I ( X ))]subject to var [ e I ( X )] ν. (2.1)Note that this model reduces to Arrow (1963)’s modelmax I ∈ C E [ U ( W I ( X ))]7y setting the upper bound ν to be E [ X ]. This is because var [ e I ( X )] = var [ I ( X )] E [ X ] forall I ∈ C .In Problem (2.1), the insured’s benefit–risk consideration is captured by the utility function U ,whereas the insurer’s return–risk tradeoff is reflected by the “mean” (the expected value principle)and the “variance” (the variance bound). One may interpret the problem as one faced by aninsurer who likes to design a contract with the best interest of a representative insured in mind, soas to remain marketable and competitive, while maintaining the desired profitability and variancecontrol in the mean–variance sense. Problem (2.1) can also model a tailor-made contract designfor insuring a one-off event from an insured’s perspective. The insured aims to maximize herexpected utility while accommodating the insurer’s participation constraint reflected by the meanand variance specifications.
The Arrow–Pratt measure of absolute risk aversion (Pratt, 1964; Arrow, 1965), defined as A ( x ) := − U ′′ ( x ) U ′ ( x ) , captures the dependence of the level of risk aversion on the agent’s wealth x . If A ( x ) is decreasing in x , then the insured’s risk preference is said to exhibit decreasing absolute risk aversion (DARA).The effect of an insured’s initial wealth on the insurance demand under Arrow (1963)’s modelhas been widely studied in the literature. It is found that a wealthier DARA insured purchasesa deductible insurance with a higher deductible. For a survey on how insureds’ wealth impactsinsurance, see e.g., Gollier (2001, 2013).While risk aversion ( U ′′ <
0) captures an insured’s propensity for avoiding risk, prudence (i.e., U ′′′ >
0) reflects her tendency to take precautions against future risk. Many commonly used utilityfunctions, including those with hyperbolic absolute risk aversion (HARA) and mixed risk aversion,are prudent. Based on an experiment with a large number of subjects, Noussair et al. (2014) ob-serve that the majority of individuals’ decisions are consistent with prudence. Eeckhoudt and Kimball Representative insureds in different wealth classes or different regions may have different “typical” levels of initialwealth. Moreover, when the economy grows, a representative insured’s initial wealth may change substantially. Asshown in Subsection 4.1, the change in the insured’s initial wealth may affect her demand for insurance. Throughout the paper, the terms “increasing” and “decreasing” mean “non-decreasing” and “non-increasing,”respectively. A utility function is called HARA if the reciprocal of the Arrow–Pratt measure of absolute risk aversion is a P ( x ) := − U ′′′ ( x ) U ′′ ( x ) (2.2)for a three-time differentiable utility function U . If P ( x ) is strictly decreasing in x , then theinsured is said to exhibit strictly decreasing absolute prudence (DAP). Kimball (1990) shows thatDAP characterizes the notion that wealthier people are less sensitive to future risks. Moreover,DAP implies DARA, as noted in Proposition 21 of Gollier (2001).A term related to prudence is third-degree stochastic dominance (TSD), which was introducedby Whitmore (1970). A non-negative random variable Z is said to dominate another non-negativerandom variable Z in TSD if E [ Z ] > E [ Z ] and Z x Z y F Z ( z ) − F Z ( z )d z d y > x > . Equivalently, Z dominates Z in TSD if and only if E [ u ( Z )] > E [ u ( Z )] for all functions u satisfying u ′ > , u ′′ < u ′′′ >
0. TSD has been widely employed for decision making in finance andinsurance. For instance, Gotoh and Konno (2000) use it to study mean-variance optimal portfolioproblems. If Z dominates Z in TSD and they have the same mean and variance, then Z is saidto have less downside risk than Z . In fact, the latter is equivalent to E [ u ( Z )] > E [ u ( Z )] for anyfunction u with u ′′′ >
0; see Menezes et al. (1980).
In this section, we present our approach to solving Problem (2.1).First, consider Problem (2.1) without the variance constraint:max I ∈ C E [ U ( W I ( X ))] . (3.1)This is the classical Arrow (1963)’s model, for which the optimal contract is a deductible one ofthe form ( x − d ∗ ) + for some non-negative deductible d ∗ , where ( x ) + := max { x, } . This contract linear function, i.e., A U ( x ) = − U ′′ ( x ) U ′ ( x ) = 1 px + q for some p > q . It includes exponential, logarithmic and power utility functions as special cases. For furtherdiscussion of HARA, see Gollier (2001). A utility function is said to be of mixed risk aversion if ( − n U ( n ) ( x ) x and n = 1 , , , · · · , where U ( n ) denotes the n th derivative of U . utomatically satisfies the incentive-compatible condition. Moreover, Chi (2019) (see Theorem 4.2therein) was the first to derive an analytical form of the optimal deductible level d ∗ . More precisely,define V aR ρ ( X ) := inf (cid:26) x ∈ [0 , M ] : F X ( x ) > ρ ρ (cid:27) and ϕ ( d ) := E [ U ′ ( W ( x − d ) + ( X ))] U ′ ( w − d − π (( X − d ) + )) , d < M , where inf ∅ := M by convention. Then the optimal d ∗ is d ∗ = sup (cid:26) V aR ρ ( X ) d < M : ϕ ( d ) >
11 + ρ (cid:27) ∨ V aR ρ ( X ) , (3.2)where sup ∅ := 0 and x ∨ y := max { x, y } . This leads immediately to the following proposition.
Proposition 3.1. If ν > var [( X − d ∗ ) + ] , then I ( x ) = ( x − d ∗ ) + is the optimal solution to Problem (2.1) . Intuitively, if the variance bound ν is set sufficiently high, then the variance constraint inProblem (2.1) is redundant and the problem reduces to the classical Arrow (1963)’s problem.Proposition 3.1 tells exactly and explicitly what the bound should be for the variance constraint tobe binding.Therefore, it suffices to solve Problem (2.1) for the case in which ν < var [( X − d ∗ ) + ], which wenow set as an assumption. Assumption 3.1.
The variance bound ν satisfies ν < var [( X − d ∗ ) + ] . The main thrust for finding the solution is to first restrict the analysis with a fixed level ofexpected indemnity and then find the optimal level of expected indemnity. To this end, we needto first identify the range in which the optimal expected indemnity possibly lies. Noting that var [( X − d ) + ] is strictly decreasing and continuous in d over [ess inf X, M ), we define d L := inf { d > d ∗ : var [( X − d ) + ] ν } and m L := E [( X − d L ) + ] . Intuitively, the insurer would demand a deductible higher than Arrow’s level d ∗ due to the additionalrisk control reflected by the variance constraint, and d L is the smallest deductible that makes thisconstraint binding. The number d ∗ can be numerically computed easily, because ϕ ( d ) is decreasing over [ V aR ρ ( X ) , M ); see Chi(2019). emma 3.2. Under Assumption 3.1, for Problem (2.1) , any admissible insurance policy I with E [ I ( X )] m L is no better than the deductible contract I L where I L ( x ) = ( x − d L ) + . Therefore, we can rule out any contract whose expected indemnity is strictly smaller than m L ;in other words, m L is a lower bound of the optimal expected indemnity. In particular, no-insurance(i.e., I ∗ ( x ) ≡
0) is never optimal under Assumption 3.1.Next, we are to derive an upper bound of the optimal expected indemnity. Consider a loss-capped contract X ∧ k , where x ∧ y := min { x, y } and k >
0, which pays the actual loss up to thecap k . Define K U := inf { k > var [ X ∧ k ] > ν } and m U := E [ X ∧ K U ] . In the above, K U is well-defined because X ∧ k − E [ X ∧ k ] is increasing in k in the sense of convexorder, according to Lemma A.2 in Chi (2012). Clearly, both K U and m U depend on the variancebound ν . Since var [ X ] > var [( X − d ∗ ) + ] > ν , we have K U < M , m U < E [ X ] and var [ X ∧ K U ] = ν. Lemma 3.3.
For any I ∈ C with var [ I ( X )] ν , we must have E [ I ( X )] m U . Moreover, if I ∈ C satisfies var [ I ( X )] ν and E [ I ( X )] = m U , then I ( X ) = X ∧ K U almost surely. This lemma stipulates that m U is an upper bound of the optimal expected indemnity. Moreover,any admissible contract achieving this upper bound is equivalent to the loss-capped contract X ∧ K U .An immediate corollary of the lemma is m L m U , noting that var [( X − d L ) + ] = ν .The following result identifies the case m L = m U as a trivial one. Proposition 3.4. If m L = m U , then the loss X must follow a Bernoulli distribution with values and d L + K U . Moreover, under Assumption 3.1, the optimal contract of Problem (2.1) is I ∗ (0) = 0 and I ∗ ( d L + K U ) = K U . In what follows, we consider the general and interesting case in which m L < m U . For m ∈ ( m L , m U ), define C m := { I ∈ C : var [ I ( X )] ν, E [ I ( X )] = m } . A loss-capped contract is also called “full insurance up to a (policy) limit” or “full insurance with a cap.” A random variable Y is said to be greater than a random variable Z in the sense of convex order, denoted as Z cx Y , if E [ Y ] = E [ Z ] and E [( Z − d ) + ] E [( Y − d ) + ] ∀ d ∈ R , provided that the expectations exist. Obviously, Z cx Y implies var [ Z ] var [ Y ].
11e now focus on the following optimization problemmax I ∈ C m E [ U ( W I ( X ))] , (3.3)which is a “cross section” of the original problem (2.1) where the expected indemnity is fixed as m .For λ ∈ R and β >
0, denote I λ,β ( x ) := sup n y ∈ [0 , x ] : U ′ ( w − x + y − (1 + ρ ) m ) − λ − βy > o , x ∈ [0 , M ] . (3.4)Actually, I λ,β is a contract that coinsures above a deductible or coinsures following full insurance,depending on the relative values between λ and U ′ ( w − (1 + ρ ) m ). To see this, when λ > U ′ ( w − (1 + ρ ) m ), we have I λ,β ( x ) = , x w − (1 + ρ ) m − ( U ′ ) − ( λ ) ,f λ,β ( x ) , w − (1 + ρ ) m − ( U ′ ) − ( λ ) < x M , (3.5)and when λ < U ′ ( w − (1 + ρ ) m ), we have I λ,β ( x ) = x, x U ′ ( w − (1+ ρ ) m ) − λ β ,f λ,β ( x ) , U ′ ( w − (1+ ρ ) m ) − λ β < x M , (3.6)where f λ,β ( x ) satisfies the following equation in y : U ′ ( w − x + y − (1 + ρ ) m ) − λ − βy = 0 . (3.7)Moreover, it is easy to see that 0 f λ,β ( x ) x either when λ > U ′ ( w − (1 + ρ ) m ) and w − (1 + ρ ) m − ( U ′ ) − ( λ ) < x M , or when λ < U ′ ( w − (1 + ρ ) m ) and U ′ ( w − (1+ ρ ) m ) − λ β < x M .Furthermore, f ′ λ,β ( x ) = − U ′′ ( w − x + f λ,β ( x ) − (1 + ρ ) m )2 β − U ′′ ( w − x + f λ,β ( x ) − (1 + ρ ) m ) ∈ (0 , . (3.8)The following result indicates that there exists an optimal solution to Problem (3.3) that is in theform of I λ,β ( x ) and binds both the mean and variance constraints. Proposition 3.5.
Suppose Assumption 3.1 holds and m L < m U . Then there exist λ ∗ m ∈ R and β ∗ m > such that I λ ∗ m ,β ∗ m satisfies E [ I λ ∗ m ,β ∗ m ( X )] = m and var [ I λ ∗ m ,β ∗ m ( X )] = ν, (3.9) and is an optimal solution to Problem (3.3) . It can be shown easily that this equation has a unique solution. I L ( x ) = ( x − d L ) + , aloss-capped one of the form I U ( x ) = x ∧ K U and a general one of the form I λ ∗ m ,β ∗ m ( x ). In otherwords, the optimal solutions of the following maximization problemmax I ∈ (cid:8) I L , I U , I λ ∗ m,β ∗ m for m ∈ ( m L ,m U ) (cid:9) E [ U ( W I ( X ))] , (3.10)where m L < m U , also solve Problem (2.1).Note that I L , I U and I λ ∗ m ,β ∗ m all satisfy the incentive-compatible condition (see (3.8)); hence, sodoes at least one of the optimal contracts I ∗ of (2.1). That is, I ∗ (0) = 0 and 0 I ∗′ ( x ) Therefore, it suffices to solve the following maximization problemmax I ∈IC E [ U ( W I ( X ))]subject to var [ e I ( X )] ν, (3.11)where IC := (cid:8) I : I (0) = 0 , I ′ ( x ) , ∀ x ∈ [0 , M ] (cid:9) ( C , (3.12)to obtain an optimal contract for Problem (2.1).Notice that IC is convex on which E [ U ( W I ( X ))] is strictly concave. Using the convex propertyof variance and applying arguments similar to those in the proof of Proposition 3.1 in Chi and Wei(2020), we obtain the following proposition: Proposition 3.6. (i)
There exist optimal solutions to Problem (3.11) . (ii) Assume either ρ > or P { X < ǫ } > for all ǫ > . Then there exists a unique solution toProblem (3.11) in the sense that I ( X ) = I ( X ) almost surely for any two solutions I and I . Note that the assumptions in Proposition 3.6-(ii) are satisfied in most situations of practicalinterest because either an insurer naturally sets a positive safety loading, or a loss actually neveroccurs with a positive probability, or both happen. On the other hand, since any optimal solutionto Problem (3.11) also solves Problem (2.1), Proposition 3.6-(i) establishes the existence of optimal As will be evident in the sequel, the values of I ′ on a set with zero Lebesgue measure have no impact on I . Therefore, we will often omit the phrase “almost everywhere” in statements regarding the marginal indemnityfunction I ′ throughout this paper. Moreover, the argument proving Proposition 3.1 in Chi and Wei (2020)can be used to show that Proposition 3.6-(ii) holds true for Problem (2.1) as well. Finally, now thatwe have the existence and uniqueness of the optimal solutions for both Problems (3.11) and (2.1),we conclude that these two problems are indeed equivalent under the assumptions of Proposition3.6-(ii).While the analysis of Problem (2.1) is simplified to Problem (3.10), it remains challenging tosolve this problem because λ ∗ m and β ∗ m are implicit functions of m . Before attacking this problem,we introduce a useful result that provides a general qualitative structure for the optimal indemnityfunction in Problem (3.11) or, equivalently, Problem (2.1). Proposition 3.7.
Under Assumption 3.1, if I ∗ is a solution to Problem (3.11) , then there exists β ∗ > such that I ∗′ ( x ) = , Φ I ∗ ( x ) > ,c I ∗ ( x ) , Φ I ∗ ( x ) = 0 , , Φ I ∗ ( x ) < , (3.13) for some function c I ∗ bounded on [0 , , where Φ I ( x ) := E (cid:2) U ′ ( W I ( X )) − β ∗ I ( X ) | X > x (cid:3) − (cid:0) (1+ ρ ) E (cid:2) U ′ ( W I ( X )) (cid:3) − β ∗ I ( X ) (cid:1) , x ∈ [0 , M ) (3.14) for I ∈ IC . Note that (3.13) does not entail an explicit expression of I ∗′ because its right hand side alsodepends on I ∗ as well as on an unknown parameter β ∗ . While deriving the optimal solution I ∗ directly from (3.13) seems challenging, the equation reveals the important property that I ∗′ musttake a value of either 0 or 1, except at point(s) x where Φ I ∗ ( x ) = 0. This property will in turnhelp us to decide whether the optimal contract is of the form ( x − d L ) + , x ∧ K U , or I λ ∗ m ,β ∗ m .The following theorem presents a complete solution to Problem (2.1). Theorem 3.8.
Under Assumption 3.1 and assume that the c.d.f. F X is strictly increasing on (0 , M ) . We have the following conclusions: It is difficult to prove the existence of solutions to Problem (2.1) directly because its feasible set is not compactonly under the principle of indemnity. From the control theory perspective, (3.13) corresponds to an optimal control problem in which I ′ is taken asthe control variable. Moreover, the optimal control turns out to be of the so-called “bang-bang” type, whose valuesdepend on the sign of the discriminant function Φ I . This type of optimal control problem arises when the Hamiltoniandepends linearly on control and the control is constrained between an upper bound and a lower bound. It is usuallyhard to solve for optimal control when the discriminant function is complex, which is the case here. If ρ = 0 , then the optimal indemnity function is I ∗ , where I ∗ ( x ) solves the following equationin y for all x ∈ (0 , M ] : U ′ ( w − x + y − m ∗ ) − β ∗ y − U ′ ( w − m ∗ ) = 0 , y ∈ (0 , x ) , (3.15) with the parameters m ∗ ∈ ( m L , m U ) and β ∗ > determined by E [ I ∗ ( X )] = m ∗ and var [ I ∗ ( X )] = ν. (3.16) (ii) If ρ > , then the optimal indemnity function is I ∗ ( x ) = , x ˜ d,f ∗ ( x ) , ˜ d < x M , (3.17) where f ∗ ( x ) satisfies f ∗ ( ˜ d ) = 0 and solves the following equation in y : U ′ ( w − (1 + ρ ) m ∗ − x + y ) − U ′ ( w − (1 + ρ ) m ∗ − ˜ d ) (3.18)= ym ∗ ρ (cid:16) U ′ ( w − (1 + ρ ) m ∗ − ˜ d ) − (1 + ρ ) E [ U ′ ( w − (1 + ρ ) m ∗ − X ∧ ˜ d )] (cid:17) , y ∈ (0 , x ) , and ˜ d ∈ ( V aR ρ ( X ) , M ) and m ∗ ∈ ( m L , m U ) are determined by (3.16) . Theorem 3.8 provides a complete solution to Problem (2.1). It indicates that the optimalcontract can not be a pure deductible of the form ( x − d L ) + , nor a pure loss-capped of the form x ∧ K U . It can only be in the form I λ ∗ m ,β ∗ m of (3.5) (rather than (3.6)). The optimal policies can becomputed by solving a system of three algebraic equations; so the result is semi-analytic.Actuarially, Theorem 3.8 reveals how the variance bound impacts the contract. When thebound ν is sufficiently low so that it is binding (hence the model does not degenerate into theclassical Arrow 1963’s model), the optimal policy is always genuine coinsurance if there is nosafety loading. Here, by “genuine” we mean the strict inequalities 0 < I ∗ ( x ) < x for all x ∈ (0 , M ],namely both the insurer and the insured pay positive portions of the loss incurred. When the safetyloading coefficient is positive, the optimal contract demands genuine coinsurance above a positivedeductible. So the variance bound translates into a change from the part of the full insurance inArrow’s contract to coinsurance. Our contracts are similar qualitatively to those of Raviv (1979),in which a utility function is in the place of a variance bound; however, ours are quantitatively different from Raviv (1979)’s.On the other hand, the deductible ˜ d is positive if and only if the safety loading coefficient ispositive. So the existence of the deductible is completely determined by the loading coefficient inthe insurance premium. This result is consistent with Mossin’s Theorem (Mossin 1968).15 orollary 3.9. Under the assumptions of Theorem 3.8, if the insured is prudent, then the propor-tion between optimal indemnity and loss increases as loss increases.
So, with a prudent insured, the insurer pays more not only absolutely but also relatively as lossincreases. Vajda (1962) restricts his study on a variance contracting problem to policies that havethis feature of the insurer covering proportionally more for larger losses. Corollary 3.9 uncovers expost this feature in our optimal policies, provided that the insured is prudent.
Thanks to the semi-analytic results derived in the previous section, we are able to analyze theimpacts of the insured’s initial wealth and the variance bound on the insurance demand.We make the following assumptions for our comparative statics analysis:
Assumption 4.1. (i) F X is strictly increasing on (0 , M ) . (ii) The insurance is fairly priced, i.e., ρ = 0 . Assumption 4.1-(i) is standard in the literature that accommodates most of the used distribu-tions by actuaries, such as exponential, lognormal, gamma, and Pareto distributions. Assumption4.1-(ii) is not necessarily plausible in practice, but it is meaningful in theory, as it describes a statein competitive equilibrium, in which insurers break even and insurance policies are actuarially fairfor representative insureds (see e.g., Rothschild and Stiglitz, 1976; Viscusi, 1979). It is importantto carry out comparative statics analyses in such a “fair” state in order to rule out any impactemanating from an unfair price. Such an assumption is indeed often imposed when conductingcomparative statics in the literature of insurance economics. For example, the comparative staticsresults of Ehrlich and Becker (1972) and Viscusi (1979) deal exclusively with actuarially fair sit-uations. Many recent studies, such as Eeckhoudt et al. (2003), Huang and Tzeng (2006) and Teh(2017), also impose this assumption for their comparative statics analyses.Finally, we will assume ν < var [ X ] throughout this section, as otherwise the variance constraintis redundant and the optimal solution is trivially full insurance. In this subsection we examine the impact of the insured’s initial wealth on insurance demand.We first recall the notion of one function up-crossing another. A function g is said to up-cross a16unction g , both defined on R , if there exists z ∈ R such that g ( x ) g ( x ) , x < z ,g ( x ) > g ( x ) , x > z . Moreover, g is said to up-cross g twice if there exist z < z such that g ( x ) g ( x ) , x < z ,g ( x ) > g ( x ) , z x < z ,g ( x ) g ( x ) , x > z . Consider two initial wealth levels w < w and denote the corresponding optimal contractsby I ∗ and I ∗ and the associated parameters by β ∗ and β ∗ , respectively, which are determined byTheorem 3.8. Recall that ρ = 0; so the insurer’s risk exposure functions are e I ∗ i ( x ) = I ∗ i ( x ) − E [ I ∗ i ( X )] = I ∗ i ( x ) − m ∗ i , i = 1 , , (4.1)where m ∗ i := E [ I ∗ i ( X )]. Taking expectations on (3.15) yields U ′ ( w i − m ∗ i ) = E [ U ′ ( w i − X + e I ∗ i ( X ))] − β ∗ i m ∗ i , which in turn implies, for i = 1 , U ′ ( w i − x + e I ∗ i ( x )) − β ∗ i e I ∗ i ( x ) − E [ U ′ ( w i − X + e I ∗ i ( X ))] = 0 , (4.2) E [ e I ∗ i ( X )] = 0 and E [( e I ∗ i ( X )) ] = var [ I ∗ i ( X )] = ν. (4.3)Note that the insurer’s profit with the contract I i is E [ I ∗ i ( X )] − I ∗ i ( X ) ≡ − e I ∗ i ( X ), i = 1 ,
2. Thefollowing theorem establishes the impact of the initial wealth on the insurance contract.
Theorem 4.1.
In addition to Assumption 4.1, we assume that ν < var [ X ] and the insured’s utilityfunction U exhibits strictly DAP. Then, the insurer’s risk exposure function with the larger initialwealth, e I ∗ ( x ) , up-crosses the risk exposure function with the smaller initial wealth, e I ∗ ( x ) , twice.Moreover, the insurer’s profit, − e I ∗ ( X ) , has less downside risk when contracting with the wealthierinsured. Figure 4.1 illustrates graphically the first part of Theorem 4.1. The actuarial implication isthat when the insured becomes wealthier, the insurer’s risk exposure is lower for large or smalllosses and is higher for moderate losses. This can be explained intuitively as follows. Even if the17igure 4.1: Comparison of two insurer’s risk exposure functions with w < w .insurance pricing is actuarially fair, the insureds are unable to transfer all the risk to the insurerdue to the variance bound. However, the wealthier insured is more tolerant with large losses dueto the DAP; hence, the insurer’s risk exposure is lower for large losses when contracting with thewealthier insured. Due to the requirement that the insurer’s expected risk exposure be always zero,the insurer’s risk exposure with the wealthier insured must be higher for moderate losses. Now,should the insurer’s risk exposure with the wealthier insured also be higher for small losses, thenoverall the insurer’s risk exposure with the wealthier insured would be strictly more spread outthan that with the less wealthy one, leading to a smaller variance of the former, which would be acontradiction. Hence, the insurer’s risk exposure must be lower for small losses with the wealthierinsured.The second part of the theorem, on the other hand, suggests that a variance minding insurerprefers to provide insurance to a wealthier insured due to the smaller downside risk. Such a findingmay shed light on why insurers underwrite relatively more business in developed countries or, in asame country, engage more business when the economy improves. Corollary 4.2.
Under the assumptions of Theorem 4.1, we have the following conclusions: For example, Hofmann (2015), an industry report from the insurance company Zurich, shows that both insurancedensities (premiums per capita) and insurance penetrations (premiums as a percent of GDP) of advanced economiesare much higher than those of emerging economies. This report also demonstrates that insurance markets in bothadvanced and emerging economies experience rapid growth when the economies grow. E [ I ∗ ( X )] < E [ I ∗ ( X )] and β ∗ < β ∗ . (ii) Either I ∗ ( x ) < I ∗ ( x ) ∀ x > , or I ∗ up-crosses I ∗ . Figure 4.2: Comparison of two optimal indemnity functions with w < w .In part (ii) of this corollary, while the case I ∗ ( x ) < I ∗ ( x ) ∀ x > I ∗ up-crossing I ∗ , we state it separately to highlight its possibility. In the classical Arrow (1963)’s model,full insurance is optimal when insurance pricing is actuarially fair. This conclusion is independent of an insured’s worth. Zhou et al. (2010) show that such a conclusion is intact when there is an exogenous upper limit imposed on the insurer’s risk exposure. Our result yields that adding avariance bound fundamentally changes the insurance demand – it makes the insured’s wealth levelrelevant, and it changes the way in which the two parties share the risk. Specifically, Corollary4.2 suggests that a DAP wealthier insured would either demand more coverage across the board orretain more larger risk and cede more smaller risk (see Figure 4.2). Either way, the expected coverageis always larger for the wealthier insured. Recall that insurance is called a normal (inferior) good if wealthier people purchase more (less) insurance converge; see Mossin (1968), Schlesinger (1981)and Gollier (2001). Millo (2016) argues that nonlife insurance is a normal good by empiricallytesting whether or not income elasticity is significantly greater than one. Armantier et al. (2018)use micro level survey data on households’ insurance coverage to conclude that insurance is a normalgood, thereby providing a better understanding of the relationship between insurance demand andeconomic development. These studies, however, are purely empirical. To the best of our knowledge,19urs is the first theoretical result regarding insurance as a normal good under the insurer’s varianceconstraint, confirming these empirical findings. In this subsection we keep the insured’s initial wealth unchanged and analyze the impact of thevariance bound on her demand for insurance. Consider two variance bounds with 0 < ν < ν
Under Assumption 4.1, the insurer’s risk exposure function with the larger variancebound, e I ∗ , up-crosses that with the smaller variance bound, e I ∗ . Under fair insurance pricing, this theorem indicates that, as the variance bound decreases, theinsurer is exposed to less risk for a larger X and to more risk for a smaller X . This result has arather significant implication in terms of the insurer’s tail risk management. A variance constraintby its very definition does not control the tail risk directly . However, Theorem 4.3 suggests that theinsurer can reduce the risk exposure for larger losses simply by tightening the variance constraint. This further justifies our formulation of the variance contracting model. Corollary 4.4. Under the assumption of Theorem 4.3, for any < ν < ν < var [ X ] , we have thefollowing conclusions: Mossin (1968), Schlesinger (1981) and Gollier (2001) show that a wealthier insured with a DARA preferencewill cede less risk under unfair insurance pricing; hence, insurance is an inferior good in the corresponding economy.Their results degenerate into full insurance when the pricing is fair, and thus insurance demand is independent of theinsured’s wealth. Consequently, our results do not contradict theirs. It follows from Lemma A.3 that the insurer with a more relaxed variance constrant suffers more underwritingrisk in the sense of convex order, i.e., e I ∗ ( X ) cx e I ∗ ( X ). E [ I ∗ ( X )] < E [ I ∗ ( X )] and β ∗ < β ∗ ; (ii) If the insured’s utility function satisfies U ′′′ > , then I ∗ ( x ) < I ∗ ( x ) ∀ x > . Figure 4.3: Comparison of two optimal indemnity functions with ν < ν .Corollary 4.4-(i) can be easily interpreted: an insurer with a tighter variance bound offersless expected coverage. As a complement to Theorem 4.3, Corollary 4.4-(ii) establishes a directcharacterization of the insured’s optimal risk transfer with regard to the change in the variancebound: A prudent insured consistently cedes more losses when the variance bound increases (seeFigure 4.3). In other words, if the insurance contract is priced fairly, the insured will transfer asmuch risk to the insurer as the latter’s risk tolerance allows. In this paper, we have revisited the classical Arrow (1963)’s model by adding a variance limit on theinsurer’s risk exposure. This constraint is motivated by the insurer’s desire to manage underwritingrisk; at the same time, it poses considerable technical challenges for solving the problem. We havedeveloped an approach to derive optimal contracts semi-analytically, in the form of coinsuranceabove a deductible when the variance constraint is active. The final policies automatically satisfythe incentive-compatible condition, thereby eliminating potential ex post moral hazard. We havealso conducted a comparative analysis to examine the impact of the insured’s wealth and of the21ariance bound on insurance demand.This work can be extended in a couple of directions. First, we have restricted the comparativeanalysis to actuarially fair insurance. Analyzing for the general unfair case calls for a differ-ent approach than the one presented here. Second, a model incorporating probability distortion(weighting) is of significant interest, both theoretically and practically. This is because probabilitydistortion, a phenomenon well documented in psychology and behavioral economics, is related totail events, about which both insurers and insureds have great concerns. AppendicesA Stochastic Orders Since the notion of stochastic orders plays an important role in this paper, we present, in thisappendix, some useful results in this regard.A random variable Y is said to be greater than a random variable Z in the sense of stop-lossorder, denoted as Z sl Y , if E [( Z − d ) + ] E [( Y − d ) + ] ∀ d ∈ R , provided that the expectations exist. It follows readily that Y is greater than Z in convex order(i.e., Z cx Y ), if E [ Y ] = E [ Z ] and Z sl Y .A useful way to verify the stop-loss order is the well–known Karlin–Novikoff criterion (Karlin and Novikoff1963). Lemma A.1. Suppose E [ Z ] E [ Y ] < ∞ . If F Z up-crosses F Y , then Z sl Y . If Z sl Y , then E [ g ( Z )] E [ g ( Y )] holds for all the increasing convex functions g , providedthat the expectations exist. Based on the Karlin–Novikoff criterion, Gollier and Schlesinger (1996)obtain the following lemma. Lemma A.2. For any h ∈ C , we have X ∧ d cx h ( X ) , where d ∈ [0 , M ] satisfies E [ X ∧ d ] = E [ h ( X )] . The following result with respect to convex order is from Lemma 3 of Ohlin (1969). Lemma A.3. Let Y be a random variable and h i , i = 1 , , be two increasing functions with E [ h ( Y )] = E [ h ( Y )] . If h up-crosses h , then h ( Y ) cx h ( Y ) . Other Useful Lemmas This appendix presents some other technical results that are useful in connection with this paper.It is easy to verify that any sequence of indemnity functions in IC is uniformly bounded andequicontinuous over [0 , M ]. Hence, the Arz´ela-Ascoli theorem implies Lemma B.1. The set IC is compact under the norm d ( I , I ) = max t ∈ [0 , M ] | I ( t ) − I ( t ) | , I , I ∈ IC . For the following lemma, one can refer to Komiya (1988) for a proof. Lemma B.2. (Sion’s Minimax Theorem) Let Y be a compact convex subset of a linear topologicalspace and Z a convex subset of a linear topological space. If Γ is a real-valued function on Y × Z such that Γ( y, · ) is continuous and concave on Z for any y ∈ Y and Γ( · , z ) is continuous and convexon Y for any z ∈ Z , then min y ∈ Y max z ∈ Z Γ( y, z ) = max z ∈ Z min y ∈ Y Γ( y, z ) . The following lemmas are needed in the comparative analysis. Lemma B.3. If a non-negative increasing function h up-crosses a non-negative increasing func-tion h with E [( h ( X )) ] = E [( h ( X )) ] , then either h ( X ) and h ( X ) have the same distributionor E [ h ( X )] < E [ h ( X )] .Proof. If E [ h ( X )] E [ h ( X )], then Lemma A.1 implies h ( X ) sl h ( X ). Moreover, we have E [( h i ( X )) ] = 2 Z ∞ E [( h i ( X ) − t ) + ]d t. Since it is assumed that E [( h ( X )) ] = E [( h ( X )) ], we must have E [( h ( X ) − t ) + ] = E [( h ( X ) − t ) + ]for any t > 0. It then follows from the equation E [( h i ( X ) − t ) + ] = R ∞ t (1 − F h i ( X ) ( y ))d y that h ( X )and h ( X ) have the same distribution. Lemma B.4. Under the assumption of Theorem 4.1, it is impossible that e I ∗ ( X ) and e I ∗ ( X ) havethe same distribution, and it is impossible either e I ∗ up-crosses e I ∗ or e I ∗ up-crosses e I ∗ , where e I ∗ i is given in (4.1) .Proof. First of all, we show that e I ∗ ( X ) and e I ∗ ( X ) cannot have the same distribution. Define φ ( z ) := − U ′′ ( w − z ) − U ′′ ( w − z ) , z < w < w . (B.1)23 direct calculation based on the assumption of strictly DAP shows that φ ′ ( z ) = ( P ( w − z ) − P ( w − z )) φ ( z ) > , where P is defined in (2.2). As a result, φ is a strictly increasing function.We now prove the result by contradiction. Assume that e I ∗ ( X ) and e I ∗ ( X ) are equal in distri-bution. Noting that e I ∗ i is increasing and Lipschitz-continuous and that F X ( x ) is strictly increasing,we have e I ∗ ( x ) = e I ∗ ( x ) , ∀ x ∈ [0 , M ] , which in turn implies e ′ I ∗ ( x ) = e ′ I ∗ ( x ) . It follows from (4.2) that e ′ I ∗ i ( x ) = β ∗ i / ( − U ′′ ( w i − x + e I ∗ i ( x )) for all x > 0; hence, 2 β ∗ − U ′′ ( w − x + e I ∗ ( x )) = 2 β ∗ − U ′′ ( w − x + e I ∗ ( x )) . Because e I ∗ ( x ) = e I ∗ ( x ) for all x ∈ [0 , M ], we obtain φ ( x − e I ∗ ( x )) = − U ′′ ( w − x + e I ∗ ( x )) − U ′′ ( w − x + e I ∗ ( x )) = β ∗ β ∗ , ∀ x ∈ [0 , M ] , which contradicts the fact that x − e I ∗ ( x ) ≡ x − I ∗ ( x ) + E [ I ∗ ( X )] is strictly increasing in x ∈ [0 , M ]and that φ is a strictly increasing function.Next we show that it is impossible that e I ∗ up-crosses e I ∗ . Again we prove the result bycontradiction. Since e I ∗ is not always non-negative, we introduce the increasing function˜ f ∗ i ( x ) := e I ∗ i ( x ) + e m, i = 1 , , where e m := max { E [ I ∗ ( X )] , E [ I ∗ ( X )] } . It then follows that ˜ f ∗ up-crosses ˜ f ∗ , and˜ f ∗ i ( x ) > , E [ ˜ f ∗ i ( X )] = e m, E [( ˜ f ∗ i ( X )) ] = E [( e I ∗ i ( X )) ] + e m = ν + e m , i = 1 , , where the second equality follows from the fact that E [ e I ∗ i ( X )] = 0. Lemma B.3 yields that ˜ f ∗ ( X )and ˜ f ∗ ( X ) have the same distribution, and hence so do e I ∗ ( X ) and e I ∗ ( X ). As shown above, e I ∗ ( X ) and e I ∗ ( X ) cannot have the same distribution, and therefore e I ∗ cannot up-cross e I ∗ . Asimilar analysis shows that it is impossible that e I ∗ up-crosses e I ∗ . The proof is thus complete. C Proofs Proof of Lemma 3.2: I ∈ C with E [ I ( X )] m L , it follows from Lemma A.2 that X ∧ d I cx R I ( X ) , where d I > d L is determined by E [( X − d I ) + ] = E [ I ( X )] (or, equivalently, E [ X ∧ d I ] = E [ R I ( X )]).Thus, we have E [ U ( W I ( X ))] E [ U ( W ( x − d I ) + ( X ))]. Furthermore, according to the proof of The-orem 4.2 in Chi (2019), E [ U ( W ( x − d ) + ( X ))] is a decreasing function of d over [ d ∗ , M ), where d ∗ isdefined in (3.2). Recalling that d L > d ∗ , we conclude that I is no better than ( x − d L ) + . Proof of Lemma 3.3: We prove by contradiction. Assume E [ I ( X )] > m U for some indemnity function I ∈ C satisfying var [ I ( X )] ν . Lemma A.2 implies that there exists K > K U such that X ∧ K cx I ( X ) . Noting that var [ X ∧ K ] is strictly increasing and continuous in K ∈ [ K U , M ], we obtain ν = var [ X ∧ K U ] < var [ X ∧ K ] var [ I ( X )] ν, leading to a contradiction.For any I ∈ C satisfying var [ I ( X )] ν and E [ I ( X )] = m U , it follows from Lemma A.2 that X ∧ K U cx I ( X ) , which in turn implies ν = var [ X ∧ K U ] var [ I ( X )] ν. Because var [ Z ] = 2 Z ∞−∞ (cid:8) E [( Z − t ) + ] − ( E [ Z ] − t ) + (cid:9) d t for any random variable Z with a finite second moment, we deduce E [( X ∧ K U − t ) + ] = E [( I ( X ) − t ) + ]almost everywhere in t . Therefore, X ∧ K U and I ( X ) are equally distributed, which implies P ( I ( X ) K U ) = 1. Moreover, it follows from I ∈ C that P ( I ( X ) X ∧ K U ) = 1. Since E [ I ( X )] = E [ X ∧ K U ], we obtain that I ( X ) = X ∧ K U almost surely. The proof is thus complete. Proof of Proposition 3.4: m L = m U , it follows from Lemma 3.3 that X ∧ K U = ( X − d L ) + almost surely. Thus,it follows from the fact that d L > X must follow a Bernoulli distribution with values 0 and d L + K U . Furthermore, Lemmas 3.2 and 3.3 imply that any admissible insurance policy is no betterthan ( x − d L ) + . Therefore, the optimal indemnity must be K U at point d L + K U . Proof of Proposition 3.5: Introducing two Lagrangian multipliers λ ∈ R and β > 0, we consider the following maximiza-tion problem:max I ∈ C U λ,β ( I ) := E h U ( w − X + I ( X ) − (1 + ρ ) m ) − λ ( E [ I ( X )] − m ) − β ( E [ I ( X )] − ν − m ) i . (C.1)Fix x > 0. The above objective function motivates the introduction of the following function:Ψ( y ) := U ( w − x + y − (1 + ρ ) m ) − λy − βy , y x. The assumption of U ′′ < ′ ( y ) = U ′ ( w − x + y − (1 + ρ ) m ) − λ − βy. As a consequence, for each x > I λ,β ( x ) defined in (3.4) is an optimal solution tomax y x Ψ( y ) . This result, together with the fact that I λ,β ∈ C , implies that I λ,β solves Problem (C.1).Notably, if there exist λ ∗ m ∈ R and β ∗ m > E [ I λ ∗ m ,β ∗ m ( X )] = m and var [ I λ ∗ m ,β ∗ m ( X )] = ν, (C.2)then I λ ∗ m ,β ∗ m solves Problem (3.3). We prove this by contradiction. Indeed, if there exists I ∗ ∈ C m such that E [ U ( W I ∗ ( X ))] > E [ U ( W I λ ∗ m,β ∗ m ( X ))], then we can obtain U λ ∗ m ,β ∗ m ( I ∗ ) > U λ ∗ m ,β ∗ m ( I λ ∗ m ,β ∗ m ) , which contradicts the fact that I λ ∗ m ,β ∗ m solves Problem (C.1) with λ = λ ∗ m and β = β ∗ m . Therefore,we only need to show the existence of λ ∗ m and β ∗ m .It follows from (3.5)-(3.8) that I λ,β satisfies the incentive-compatible condition, i.e., I λ,β ∈ IC .Such an observation motivates us to consider an auxiliary problem26ax I ∈IC m E [ U ( W I ( X ))] , (C.3)where IC m := { I ( x ) ∈ IC : var [ I ( X )] ν, E [ I ( X )] = m } . Problem (C.3) differs from Problem (3.3) in that the feasible set is IC m instead of C m . In whatfollows, we show that IC m = ∅ , and that there exists a unique optimal solution I ∗ m to Problem(C.3) satisfying var [ I ∗ m ( X )] = ν . Indeed, for any m ∈ ( m L , m U ), let θ ∈ (0 , 1) be such that m = θm U + (1 − θ ) m L . Denote I θ ( x ) := θ ( x ∧ K U ) + (1 − θ )( x − d L ) + . It follows that E [ I θ ( X )] = m and p var [ I θ ( X )] θ p var [ X ∧ K U ] + (1 − θ ) p var [( X − d L ) + ] = √ ν. As a consequence, I θ ∈ IC m ; and hence IC m is nonempty. Moreover, note that there must exist d m ∈ [0 , d L ) such that E [( X − d m ) + ] = m . For any I ∈ IC m , define˜ I α ( x ) := αI ( x ) + (1 − α )( x − d m ) + , α ∈ [0 , . Then, we have E [ ˜ I α ( X )] = m and var [ ˜ I ( X )] = var [ I ( X )] ν = var [( X − d L ) + ] var [( X − d m ) + ] = var [ ˜ I ( X )] , where the last inequality follows from the fact that var [( X − d ) + ] is decreasing in d . Because var [ ˜ I α ( X )] is continuous in α , there must exist α ∗ ∈ [0 , 1] such that var [ ˜ I α ∗ ( X )] = ν . Lemma A.2yields that X ∧ d m cx X − I ( X ) , leading to E [ U ( W ˜ I α ∗ ( X ))] > α ∗ E [ U ( W I ( X ))] + (1 − α ∗ ) E [ U ( W ( X − d m ) + ( X ))] > E [ U ( W I ( X ))] . This means that I is no better than ˜ I α ∗ . Together with the Arz´ela-Ascoli theorem, the aboveanalysis implies that there exists an optimal solution to Problem (C.3) that binds the varianceconstraint. Finally, a similar argument to the proof of Proposition 3.1 in Chi and Wei (2020)further shows that the optimal solution to Problem (C.3) must be unique.27y defining U ∗ ( λ, β ) := U λ,β ( I λ,β ) and for any α ∈ [0 , U ∗ (cid:0) αλ + (1 − α ) λ , αβ + (1 − α ) β (cid:1) = max I ∈IC U αλ +(1 − α ) λ ,αβ +(1 − α ) β ( I )= max I ∈IC (cid:8) αU λ ,β ( I ) + (1 − α ) U λ ,β ( I ) (cid:9) max I ∈IC (cid:8) αU λ ,β ( I ) } + max I ∈IC { (1 − α ) U λ ,β ( I ) (cid:9) = α max I ∈IC (cid:8) U λ ,β ( I ) (cid:9) + (1 − α ) max I ∈IC (cid:8) U λ ,β ( I ) (cid:9) = αU ∗ ( λ , β ) + (1 − α ) U ∗ ( λ , β ) , where the second equality is due to the fact that U λ,β ( I ) is linear in ( λ, β ) for any given I ∈ IC .Thus, U ∗ ( λ, β ) is convex in ( λ, β ).Furthermore, denoting by U ∗∗ the maximal EU value of the insured’s final wealth in Problem(C.3), we have U ∗∗ U ( w − (1 + ρ ) m ) < ∞ . On the other hand, for any given I ∈ IC satisfying E [ I ( X )] = m or var [ I ( X )] > ν , it is easy to show that min λ ∈ R ,β > U λ,β ( I ) = −∞ . Noting that U , ( I ) = E [ U ( w − X + I ( X ) − (1 + ρ ) m )] , we have max I ∈IC min λ ∈ R ,β > U λ,β ( I ) U ∗∗ . Now, max I ∈IC U λ,β ( I ) > max I ∈IC m U λ,β ( I ) > max I ∈IC m E [ U ( w − X + I ( X ) − (1 + ρ ) m )] , ∀ λ ∈ R , β > , leading to U ∗∗ min λ ∈ R ,β > max I ∈IC U λ,β ( I ). Since U λ,β ( I ) is continuous and strictly concave in I , we canobtain from Lemmas B.1 and B.2 thatmin I ∈IC max λ ∈ R ,β > − U λ,β ( I ) = max λ ∈ R ,β > min I ∈IC − U λ,β ( I ) , which implies max I ∈IC min λ ∈ R ,β > U λ,β ( I ) = min λ ∈ R ,β > max I ∈IC U λ,β ( I ) = U ∗∗ . By denoting λ U := ( U ∗∗ + 1) /m , we have U ∗ ( λ, β ) = max I ∈IC U λ,β ( I ) > U λ,β (0) > λ U m = U ∗∗ + 1 , ∀ λ > λ U , β > . Furthermore, we define λ L := − ( U ∗∗ +1) / ( E [ X ∧ K ] − m ), where K is determined by E (cid:2) ( X ∧ K ) (cid:3) = ν + m < E (cid:2) ( X ∧ K U ) (cid:3) . Clearly, K ∈ (0 , K U ). If E [ X ∧ K ] m , then var [ X ∧ K ] > ν =28 ar [ X ∧ K U ], which contradicts the definition of K U . Thus, we must have E [ X ∧ K ] > m , andhence U ∗ ( λ, β ) = max I ∈IC U λ,β ( I ) > U λ,β ( x ∧ K ) > − λ L ( E [ X ∧ K ] − m ) > U ∗∗ + 1 , ∀ λ λ L , β > . Similarly, let β U := U ∗∗ +1 ν − var [ X ∧ K ] , where K is determined by E [ X ∧ K ] = m . Here, we have K < K U , which in turn implies var [ X ∧ K ] < ν based on the definition of K U . For any β > β U and λ ∈ R , we obtain U ∗ ( λ, β ) = max I ∈IC U λ,β ( I ) > U λ,β ( x ∧ K ) > β U ( ν − var [ X ∧ K ]) = U ∗∗ + 1 . The above analysis indicates that U ∗∗ = min λL λ λU β βU U ∗ ( λ, β ) . Thus, it follows from the convexity of U ∗ ( λ, β ) and Weierstrass’s theorem that there exist λ ∗ m ∈ [ λ L , λ U ] and β ∗ m ∈ [0 , β U ] such that U ∗∗ = U ∗ ( λ ∗ m , β ∗ m ) = min λ ∈ R ,β > U ∗ ( λ, β ) . Moreover, we have U ∗ ( λ ∗ m , β ∗ m ) = max I ∈IC U λ ∗ m ,β ∗ m ( I ) > max I ∈IC m ,var [ I ( X )]= ν U λ ∗ m ,β ∗ m ( I ) = U ∗∗ , (C.4)where the second equality is derived from the fact that the optimal solution to Problem (C.3) bindsthe variance constraint. In addition, thanks to (C.4), the unique optimal solution I ∗ m of Problem(C.3) must solve Problem (C.1) with λ = λ ∗ m and β = β ∗ m . Note that U λ ∗ m ,β ∗ m ( I ) is strictly concavein I ; therefore, I ∗ m ( X ) = I λ ∗ m ,β ∗ m ( X ) almost surely. As a result, I λ ∗ m ,β ∗ m satisfies (C.2) and must bea solution to Problem (C.3).Finally, we show that β ∗ m > 0. Otherwise, if β ∗ m = 0, then I λ ∗ m ,β ∗ m ( x ) = ( x − d m ) + , yielding acontradiction ν = var [( X − d m ) + ] > var [( X − d L ) + ] = ν. The proof is thus complete. Proof of Proposition 3.7: L β ( I ) := E (cid:2) U (cid:0) w − X + I ( X ) − (1 + ρ ) E [ I ( X )] (cid:1)(cid:3) − β (cid:0) var [ I ( X )] − ν (cid:1) , β > , I ∈ IC , which is linear in β and concave in I , because U ′′ ( · ) < var [ I ( X )] is convex in I .We denote by U ∗∗∗ the maximum EU value of the insured’s final wealth in Problem (3.11).Using an argument similar to that in the proof of Proposition 3.5, we havemax I ∈IC min β > L β ( I ) U ∗∗∗ min β > max I ∈IC L β ( I ) , which, together with Lemma B.2, implies U ∗∗∗ = min β > max I ∈IC L β ( I ) . Denoting ˜ β = U ∗∗∗ +1 ν , we havemax I ∈IC L β ( I ) > L β (0) > βν > U ∗∗∗ + 1 , ∀ β > ˜ β. Furthermore, since max I ∈IC L β ( I ) is convex in β , there must exist β ∗ ∈ [0 , ˜ β ] such that U ∗∗∗ = max I ∈IC L β ∗ ( I ) . If β ∗ = 0, then U ∗∗∗ = max I ∈IC E [ U ( W I ( X ))] , in which case the stop-loss insurance ( x − d ∗ ) + is optimal, where d ∗ is defined in (3.2). However,this is contradicted by Assumption 3.1. So we must have β ∗ > I ∗ , which solves Problem (2.1) underAssumption 3.1, satisfies var [ I ∗ ( X )] = ν and therefore also solves the maximization problemmax I ∈IC L β ∗ ( I ) . Thus, for any I ∈ IC , we havelim α ↓ L β ∗ ((1 − α ) I ∗ ( x ) + αI ( x )) − L β ∗ ( I ∗ ( x )) α , leading to0 > E h U ′ ( W I ∗ ( X )) (cid:0) I ( X ) − I ∗ ( X ) − (1 + ρ ) E [ I ( X ) − I ∗ ( X )] (cid:1)i − β ∗ (cid:0) cov ( I ∗ ( X ) , I ( X )) − var [ I ∗ ( X )] (cid:1) = Z ∞ (cid:16) E (cid:2) U ′ ( W I ∗ ( X ))( I { X>t } − (1 + ρ ) P { X > t } ) (cid:3) β ∗ (cid:0) E [ I ∗ ( X ) I { X>t } ] − E [ I ∗ ( X )] P { X > t } (cid:1)(cid:17)(cid:0) I ′ ( t ) − I ∗′ ( t ) (cid:1) d t = Z M (cid:16) E (cid:2) U ′ ( W I ∗ ( X )) − β ∗ I ∗ ( X ) | X > t (cid:3) − (1 + ρ ) E (cid:2) U ′ ( W I ∗ ( X ))] + 2 β ∗ E [ I ∗ ( X ) (cid:3)(cid:17) P { X > t } (cid:0) I ′ ( t ) − I ∗′ ( t ) (cid:1) d t = Z M Φ I ∗ ( t ) P { X > t } (cid:0) I ′ ( t ) − I ∗′ ( t ) (cid:1) d t, where Φ I ∗ is defined in (3.14) and I A is the indicator function of an event A . The arbitrariness of I ∈ IC and the fact that P { X > t } > t < M yield that I ∗′ should be in the form of(3.13). The proof is complete. Proof of Theorem 3.8: Let I ∗ be optimal for Problem (2.1). Then it follows from Proposition 3.7 thatΦ I ∗ ( x ) = E (cid:2) ψ ( X ) | X > x (cid:3) (C.5)where ψ ( x ) := U ′ ( W I ∗ ( x )) − β ∗ I ∗ ( x ) − (cid:16) (1 + ρ ) E [ U ′ ( W I ∗ ( X ))] − β ∗ E [ I ∗ ( X )] (cid:17) , (C.6)for some β ∗ > F X is assumed to be strictly increasing on (0 , M ), it follows from Proposition 3.4 that m L < m U . Recall that solving Problem (2.1) can be reduced to solving Problem (3.10) under As-sumption 3.1. In the following, we proceed to solve Problem (3.10) with the help of Proposition 3.7.We carry out the analysis for three cases:Case (A) If I ∗ ( x ) = ( x − d L ) + , then ψ is strictly increasing on [0 , d L ] and strictly decreasing on [ d L , M ).If there exists ˜ x ∈ [ d L , M ) such that ψ (˜ x ) < 0, then ψ ( x ) < I ∗ ( x ) < ∀ x ∈ [˜ x, M ) , which contradicts Proposition 3.7. Therefore, ψ ( x ) > ∀ x ∈ [ d L , M ) and ψ ( d L ) > ψ ( x ) is continuous in x and F X is strictly increasing on (0 , M ), there must exist ǫ > I ∗ ( x ) > ∀ x ∈ [ d L − ǫ, d L ), contradicting Proposition 3.7. So ( x − d L ) + cannot be an optimal solution to Problem (3.10).Case (B) If I ∗ ( x ) = x ∧ K U , then ψ is strictly decreasing on [0 , K U ] and strictly increasing on [ K U , M ).Using a similar argument as the one for Case (A), we deduce ψ ( x ) ∀ x ∈ [ K U , M ) and ψ ( K U ) < 0. Since ψ is a continuous function, there exists ǫ > I ∗ ( x ) < x ∈ [ K U − ǫ, K U ), contradicting Proposition 3.7. As a result, x ∧ K U cannot be optimal toProblem (3.10) either.Case (C) Hence, the optimal solution must be of the form I ∗ ( x ) = I λ ∗ m ,β ∗ m ( x ) for some m ∈ ( m L , m U ).Noting that I ′ λ ∗ m ,β ∗ m ( x ) ∈ (0 , 1) for sufficiently large x due to β ∗ m > I ∗ ( x ) = 0 for sufficiently large x . Thus, together with (3.7), (C.6)and (C.5), Proposition 3.7 further implies that β ∗ m = β ∗ and λ ∗ m = (1 + ρ ) E [ U ′ ( W I ∗ ( X ))] − β ∗ E [ I ∗ ( X )] . (C.7)Next, we consider two subcases that depend on the values of λ ∗ m and U ′ ( w − (1 + ρ ) m ).(C.1) If λ ∗ m < U ′ ( w − (1 + ρ ) m ), then I ∗ ( x ) = x ∀ x ˆ x and U ′ ( w − x + I ∗ ( x ) − (1 + ρ ) m ) − β ∗ I ∗ ( x ) − λ ∗ m = 0 , x > ˆ x,> , x < ˆ x, where ˆ x := U ′ ( w − (1+ ρ ) m ) − λ ∗ m β ∗ > 0. Therefore, it follows that0 < E [ U ′ ( w − X + I ∗ ( X ) − (1 + ρ ) m ) − β ∗ I ∗ ( X ) − λ ∗ m ]= E [ U ′ ( W I ∗ ( X ))] − β ∗ E [ I ∗ ( X )] − (1 + ρ ) E [ U ′ ( W I ∗ ( X ))] + 2 β ∗ E [ I ∗ ( X )]= − ρ E [ U ′ ( W I ∗ ( X ))] , leading to a contradiction.(C.2) Consequently, we must have λ ∗ m > U ′ ( w − (1 + ρ ) m ), in which case I ∗ is coinsuranceabove a deductible (i.e., (3.5)). Similarly to Subcase (C.1), we can show that U ′ ( w − x + I ∗ ( x ) − (1 + ρ ) m ) − β ∗ I ∗ ( x ) − λ ∗ m = 0 , x > ˜ d,< , x < ˜ d, where ˜ d := w − (1 + ρ ) m − ( U ′ ) − ( λ ∗ m ) > 0. This, together with (C.7), yields λ ∗ m = U ′ ( w − ˜ d − (1 + ρ ) m ) (C.8)and − ρ E [ U ′ ( W I ∗ ( X ))] = E h(cid:0) U ′ ( w − X + I ∗ ( X ) − (1+ ρ ) m ) − β ∗ I ∗ ( X ) − λ ∗ m (cid:1) I { X< ˜ d } i . (C.9)32herefore, ρ = 0 if and only if ˜ d = 0. Moreover, if ρ = 0, the above analysis indicatesthat I ∗ solves the equation (3.15). Otherwise, if ρ > 0, then it follows from (C.7) and(C.9) that E [ U ′ ( W I ∗ ( X ))] = E [ U ′ ( w − X − (1 + ρ ) m ) I { X ˜ d } ] + 2 β ∗ mF X ( ˜ d )1 − (1 + ρ ) P { X > ˜ d } , which in turn implies ˜ d > V aR ρ ( X ). Plugging the above equation and (C.8) into(C.7) yields2 β ∗ = 1 mρ (cid:16) U ′ ( w − ˜ d − (1 + ρ ) m ) − (1 + ρ ) E [ U ′ ( w − X ∧ ˜ d − (1 + ρ ) m )] (cid:17) . As a result, the optimal solution I ∗ must be given by (3.17). The proof is complete. Proof of Corollary 3.9: We prove for the case in which ρ > 0, but note that the proof to the case of ρ = 0 is similarand indeed simpler. It follows from (3.5) and (3.8) that I λ ∗ m ,β ∗ m ( x ) = 0 for x ˜ d and I ′ λ ∗ m ,β ∗ m ( x ) = f ′ λ ∗ m ,β ∗ m ( x ) increases in x for x > ˜ d , where we use the fact that x − I λ ∗ m ,β ∗ m ( x ) increases in x , and β ∗ m > 0. Moreover, for x > ˜ d , taking the derivative of I λ ∗ m,β ∗ m ( x ) x with respect to x yields (cid:16) I λ ∗ m ,β ∗ m ( x ) x (cid:17) ′ = (cid:16) f λ ∗ m ,β ∗ m ( x ) x (cid:17) ′ = f ′ λ ∗ m ,β ∗ m ( x ) x − f λ ∗ m ,β ∗ m ( x ) x > R x ˜ d (cid:0) f ′ λ ∗ m ,β ∗ m ( x ) − f ′ λ ∗ m ,β ∗ m ( y ) (cid:1) d yx > . This completes the proof. Proof of Theorem 4.1: We first prove the result assuming that there exists x ∈ (0 , M ] such that y := e I ∗ ( x ) = e I ∗ ( x ) and φ ( x − y ) > β ∗ β ∗ , (C.10)where φ is defined in (B.1). First, we show that e I ∗ ( x ) up-crosses e I ∗ ( x ) in a neighbour of x . Tothis end, we first note e ′ I ∗ ( x ) − e ′ I ∗ ( x ) = 11 + β ∗ − U ′′ ( w − x + y ) − 11 + β ∗ − U ′′ ( w − x + y ) > , which in turn implies that there exists an ǫ > e I ∗ ( x ) < e I ∗ ( x ) , x ∈ ( x − ǫ, x ) ,e I ∗ ( x ) > e I ∗ ( x ) , x ∈ ( x , x + ǫ ) . y > x such that e I ∗ ( y ) = e I ∗ ( y ). Otherwise, if such y existed,then the increasing property of φ ( z ) in z , together with the strictly increasing property of x − e I ∗ i ( x )in x , would imply φ ( y − e I ∗ ( y )) = φ ( y − e I ∗ ( y )) > φ ( x − e I ∗ ( x )) > β ∗ β ∗ , which would in turn yield that e I ∗ up-crosses e I ∗ in a neighbour of y . This, however, contradictsthe fact that e I ∗ ( x ) > e I ∗ ( x ) for x ∈ ( x , x + ǫ ).Now define x := sup (cid:8) x ∈ [0 , x ) : e I ∗ ( x ) < e I ∗ ( x ) (cid:9) . If x = 0, then e I ∗ up-crosses e I ∗ , which contradicts Lemma B.4. Thus, we must have 0 < x < x because e I ∗ ( x ) < e I ∗ ( x ) for all x ∈ ( x − ǫ, x ). Moreover, it follows readily that φ (cid:0) x − e I ∗ ( x ) (cid:1) β ∗ β ∗ . In the following, we show that there exists no point x ∈ (0 , x ) such that e I ∗ ( x ) = e I ∗ ( x ).Indeed, if such x existed, then, noting that φ is strictly increasing, we would have φ (cid:0) x − e I ∗ ( x ) (cid:1) < β ∗ β ∗ , leading to e ′ I ∗ ( x ) − e ′ I ∗ ( x ) < . In other words, e I ∗ up-crosses e I ∗ at x . This contradicts the factthat e I ∗ up-crosses e I ∗ at x . Therefore, we can now conclude that e I ∗ up-crosses e I ∗ twice when(C.10) is satisfied.Let us now consider the case in which (C.10) is not satisfied. We study two cases:Case (A) If there exists no x ∈ (0 , M ] such that e I ∗ ( x ) = e I ∗ ( x ), then it is easy to show from E [ e I ∗ i ( X )] = 0 that e I ∗ ( X ) and e I ∗ ( X ) have the same distribution. This contradicts LemmaB.4.Case (B) Otherwise, any x ∈ (0 , M ] satisfying e I ∗ ( x ) = e I ∗ ( x ) must have φ ( x − e I ∗ i ( x )) β ∗ β ∗ . If the above inequality is always strict, then the previous analysis shows that e I ∗ up-crosses e I ∗ , which contradicts Lemma B.4. Hence, there must exist x ∈ (0 , M ] such that y := e I ∗ ( x ) = e I ∗ ( x ) and φ ( x − y ) = β ∗ β ∗ . In this case, we further divide our analysis into three subcases.34B.1) If e I ∗ up-crosses e I ∗ at x , then the above analysis would imply that no up-crossingoccurs before x . A similar argument indicates that e I ∗ ( X ) and e I ∗ ( X ) would have thesame distribution, which would not be possible.(B.2) Otherwise, if e I ∗ up-crosses e I ∗ at x , then the previous analysis shows that e I ∗ up-crosses e I ∗ twice.(B.3) Finally, if no up-crossing happens at x , then we can simply neglect this single point x in the analysis. If there further exists x ∈ (0 , x ) satisfying e I ∗ ( x ) = e I ∗ ( x ), then wehave φ ( x − e I ∗ i ( x )) < β ∗ β ∗ . In this case, the previous analysis indicates that e I ∗ up-crosses e I ∗ at x , which contra-dicts Lemma B.4. Otherwise, if there exists no x ∈ (0 , x ) such that e I ∗ ( x ) = e I ∗ ( x ),then it follows from E [ e I ∗ i ( X )] = 0 that e I ∗ ( X ) and e I ∗ ( X ) have the same distribution.This again contradicts Lemma B.4.In summary, we have shown that e I ∗ up-crosses e I ∗ twice. Because e I ∗ ( X ) and e I ∗ ( X ) havethe same first two moments, we can easily see that the insurer’s profit with the wealthier insured, − e I ∗ ( X ), has less downside risk than the counterpart with the less wealthy insured, − e I ∗ ( X ), whenthe insurance pricing is actuarially fair. The proof is complete. Proof of Corollary 4.2: (i) It follows from the proof of Theorem 4.1 that e I ∗ (0) < e I ∗ (0) , which is equivalent to E [ I ∗ ( X )] < E [ I ∗ ( X )]. Suppose that e I ∗ up-crosses e I ∗ at points x and x with 0 < x < x .Then e I ∗ ( x j ) = e I ∗ ( x j ) for j = 0 , 1, which, together with (4.2), implies U ′ ( w − x + e I ∗ ( x )) − U ′ ( w − x + e I ∗ ( x )) − β ∗ − β ∗ ) e I ∗ ( x )= U ′ ( w − x + e I ∗ ( x )) − U ′ ( w − x + e I ∗ ( x )) − β ∗ − β ∗ ) e I ∗ ( x ) . (C.11)Denote L ( y ) := U ′ ( w − y ) − U ′ ( w − y ) , y ∈ [0 , w ) . Then L ′ ( y ) = − U ′′ ( w − y ) + U ′′ ( w − y ) > w < w and U ′′′ > 0. Recalling that x − e I ∗ i ( x ) and e I ∗ i ( x ) are strictly increasing in x , wededuce from (C.11) that β ∗ < β ∗ . (ii) Let us denote e w i := w i − E [ I ∗ i ( X )], i = 1 , . The following analysis depends on the comparisonbetween e w and e w .Case (A) If e w = e w and there exists z ∈ [0 , M ] such that I ∗ ( z ) = I ∗ ( z ), then it follows from (3.15)that 2( β ∗ − β ∗ ) I ∗ i ( z ) = 0 , i = 1 , 2. Recalling that β ∗ < β ∗ , we have I ∗ i ( z ) = 0, which implies z = 0. Because E [ I ∗ ( X )] < E [ I ∗ ( X )], we conclude that I ∗ ( x ) < I ∗ ( x ) ∀ x ∈ (0 , M ].Case (B) Otherwise, if e w = e w , we can show that either I ∗ ( x ) < I ∗ ( x ) ∀ x > I ∗ up-crosses I ∗ .Indeed, if there exists z ∈ (0 , M ] such that I ∗ ( z ) = I ∗ ( z ), then we have U ′ ( e w − z + I ∗ ( z )) − U ′ ( e w ) − ( U ′ ( e w − z + I ∗ ( z )) − U ′ ( e w ))= 2( β ∗ − β ∗ ) I ∗ ( z ) > . Noting that U ′ ( e w − y ) − U ′ ( e w ) is strictly decreasing in e w for any y > U ′′′ > e w < e w . Furthermore, we have I ∗ i ′ ( z ) = 11 + β ∗ i − U ′′ ( e w i − z + I ∗ i ( z )) = 11 + U ′ ( e w i − z + I ∗ i ( z )) − U ′ ( e w i ) − U ′′ ( e w i − z + I ∗ i ( z )) × I ∗ i ( z ) , i = 1 , . Denoting H ( w ) := U ′ ( w − y ) − U ′ ( w ) − U ′′ ( w − y ) , y ∈ [0 , w ) , we obtain H ′ ( w ) = U ′ ( w − y ) − U ′ ( w ) − U ′′ ( w − y ) h P U ( w − y ) + U ′′ ( w − y ) − U ′′ ( w ) U ′ ( w − y ) − U ′ ( w ) i = H ( w ) h P U ( w − y ) − P U ( w − θy ) i > , where the second equality is due to the mean-value theorem with θ ∈ (0 , 1) and the lastinequality is due to the assumption of strict DAP. The strictly increasing property of H ( w ),together with the fact that e w < e w , yields I ∗ ′ ( z ) < I ∗ ′ ( z ) , which, in turn, implies that I ∗ up-crosses I ∗ at point z .Otherwise, if I ∗ ( x ) = I ∗ ( x ) ∀ x ∈ (0 , M ), then it follows from E [ I ∗ ( X )] < E [ I ∗ ( X )] that I ∗ ( x ) < I ∗ ( x ) ∀ x > 0. The proof is complete.36 roof of Theorem 4.3: If there exists no point x ∈ (0 , M ) such that e I ∗ ( x ) = e I ∗ ( x ), then e I ∗ ( X ) and e I ∗ ( X ) are equalin distribution due to E [ e I ∗ ( X )] = E [ e I ∗ ( X )] = 0. This contradicts the fact that var [ e I ∗ ( X )] = ν < ν = var [ e I ∗ ( X )]. Therefore, we must have e I ∗ ( z ) = e I ∗ ( z ) for some z ∈ (0 , M ); and thus(4.4) implies E [ U ′ ( w − X + e I ∗ ( X ))] − E [ U ′ ( w − X + e I ∗ ( X ))] = 2( β ∗ − β ∗ ) e I ∗ ( z ) . We divide the following proof into two cases by comparing E [ U ′ ( w − X + e I ∗ ( X ))] with E [ U ′ ( w − X + e I ∗ ( X ))].Case (A) If E [ U ′ ( w − X + e I ∗ ( X ))] = E [ U ′ ( w − X + e I ∗ ( X ))], then β ∗ = β ∗ . Since e I ∗ is a strictlyincreasing function, we have X := { x ∈ [0 , M ] : e I ∗ ( x ) = e I ∗ ( x ) } = { z } . Recalling that var [ e I ∗ ( X )] < var [ e I ∗ ( X )], we conclude from Lemma A.3 that e I ∗ up-crosses e I ∗ . Case (B) If E [ U ′ ( w − X + e I ∗ ( X ))] = E [ U ′ ( w − X + e I ∗ ( X ))], we have U ′ ( w − x + e I ∗ ( x )) − β ∗ e I ∗ ( x ) = U ′ ( w − x + e I ∗ ( x )) − β ∗ e I ∗ ( x ) . (C.12)In this case, we further divide our analysis into two subcases based on the comparison between β ∗ and β ∗ .(B.1) If β ∗ = β ∗ , then it follows from (C.12) that e I ∗ ( x ) = 0 ∀ x ∈ X . Similar to Case (A), wecan show that e I ∗ up-crosses e I ∗ . (B.2) If β ∗ = β ∗ , then (C.12) can be rewritten as U ′ ( w − x + e I ∗ ( x )) + 2 β ∗ ( x − e I ∗ ( x )) = U ′ ( w − x + e I ∗ ( x )) + 2 β ∗ ( x − e I ∗ ( x )) . Note that U ′ ( w − y ) + 2 β ∗ y is strictly increasing in y . Hence, the above equation yields x − e I ∗ ( x ) = x − e I ∗ ( x ) for all x ∈ [0 , M ], which contradicts the fact that var [ e I ∗ ( X )] , x ∈ ( z, M ] (C.13)for some z ∈ (0 , M ). Therefore, we have E [ I ∗ ( X )] = − e I ∗ (0) < − e I ∗ (0) = E [ I ∗ ( X )] . Furthermore, e ′ I ∗ ( z ) − e ′ I ∗ ( z ) . Recalling that e ′ I ∗ ( x ) = 11 + β ∗ − U ′′ ( w − x + e I ∗ ( x )) , we have β ∗ β ∗ . As β ∗ = β ∗ does not hold, as shown in the proof of Theorem 4.3, we obtain β ∗ < β ∗ .(ii) On the one hand, in view of (C.13), it follows from E [ I ∗ ( X )] < E [ I ∗ ( X )] that I ∗ ( x ) > I ∗ ( x ) ∀ x > z . On the other hand, for any x ∈ [0 , z ) , noting that e I ∗ ( x ) < e I ∗ ( x ), we deduce from U ′′′ > − U ′′ ( w − x + e I ∗ ( x )) − U ′′ ( w − x + e I ∗ ( x )) . Because β ∗ < β ∗ and e ′ I ∗ i ( x ) = 11 + β ∗ i − U ′′ ( w − x + e I ∗ i ( x )) , i = 1 , , we have e ′ I ∗ ( x ) < e ′ I ∗ ( x ) , which is equivalent to I ∗ ′ ( x ) < I ∗ ′ ( x ) ∀ x ∈ [0 , z ). Furthermore, as I ∗ (0) = I ∗ (0) = 0, it must hold that I ∗ ( x ) < I ∗ ( x ) ∀ x > 0. The proof is complete. References Armantier, O., Foncel, J. and Treich, N. (2018) Insurance and portfolio decisions: A wealth effectpuzzle. Working paper .Arrow, K.J. (1963) Uncertainty and the welfare economics of medical care. American EconomicReview , , 941-973.Arrow, K.J. (1965) Aspects of the Theory of Risk-bearing. Helsinki: Yrjo Jahnssonin Saatio .38rrow, K.J. (1971) Essays in the Theory of Risk Bearing. Chicago .Bernard, C., He, X.D., Yan, J.A. and Zhou, X.Y. (2015) Optimal insurance design under rankdependent utility. Mathematical Finance , , 154-186.Borch, K. (1960) An attempt to determine the optimum amount of stop loss reinsurance. In:Transactions of the 16th International Congress of Actuaries, Vol. I, 597-610. Brussels, Belgium:Georges Thone. Carlier, G. and Dana, R.-A. (2003) Pareto efficient insurance contracts when the insurer’s costfunction is discontinuous. Economic Theory , , 871-893.Carlier, G. and Dana, R.-A. (2005) Rearrangement inequalities in non-convex insurance models. Journal of Mathematical Economics , , 483-503.Chi, Y. (2012) Reinsurance arrangements minimizing the risk-adjusted value of an insurer’s liability. ASTIN Bulletin: The Journal of the IAA , , 529-557.Chi, Y. (2019) On the optimality of a straight deductible under belief heterogeneity. ASTIN Bul-letin: The Journal of the IAA , , 242-263.Chi, Y. and Wei, W. (2020) Optimal insurance with background risk: An analysis of generaldependence structures. Finance and Stochastics , Forthcoming.Cummins, J.D. and Mahul, O. (2004) The demand for insurance with an upper limit on coverage. The Journal of Risk and Insurance , , 253-264.Dionne, G. and St-Michel, P. (1991) Workers’ compensation and moral hazard. The Review ofEconomics and Statistics , , 236-244.Doherty, N.A., Laux, C. and Muermann, A. (2015) Insuring nonverifiable losses. Review of Finance , , 283-316.Doherty, N.A. and Schlesinger, H. (1990) Rational insurance purchasing: Consideration of contractnon-performance. The Quarterly Journal of Economics , , 243-253.Eeckhoudt, L. and Kimball, M. (1992) Background risk, prudence, and the demand for insurance.Contributions to Insurance Economics, edited by G. Dionne, 239-254. New York: Springer .39eckhoudt, L., Mahul, O. and Moran, J. (2003) Fixedreimbursement insurance: Basic propertiesand comparative statics. Journal of Risk and Insurance , , 207-218.Ehrlich, I. and Becker, G.S. (1972) Market insurance, self-insurance, and self-protection. Journalof political Economy , , 623-648.Gollier, C. (1996) Optimum insurance of approximate losses. The Journal of Risk and Insurance , , 369-380.Gollier, C. (2001) The Economics of Risk and Time. London: The MIT Press .Gollier, C. (2013) The economics of optimal insurance design. Handbook of Insurance, edited byG. Dionne. New York: Springer .Gollier, C. and Schlesinger, H. (1996). Arrow’s theorem on the optimality of deductibles: a stochas-tic dominance approach. Economic Theory , , 359-363.Gotoh, J.Y. and Konno, H. (2000) Third degree stochastic dominance and mean-risk analysis. Management Science , , 289-301.Huang, R.J. and Tzeng, L.Y. (2006) The design of an optimal insurance contract for irreplaceablecommodities. The Geneva Risk and Insurance Review , , 11-21.Hofmann, D.M. (2015) Insurance - a global view. Second edition, Zurich Insurance Company Ltd .H¨olmstrom, B. (1979) Moral hazard and observability. The Bell journal of economics , , 74-91.Huberman, G., Mayers, D. and Smith, C.W. (1983) Optimal insurance policy indemnity schedules. The Bell Journal of Economics , , 415-426.Kaluszka, M. (2001) Optimal reinsurance under mean-variance premium principles. Insurance:Mathematics and Economics , , 61-67.Karlin, S. and Novikoff, A. (1963) Generalized convex inequalities. Pacific Journal of Mathematics , , 1251-1279.Kaye, P. (2005) Risk measurement in insurance: A guide to risk measurement, capital allocationand related decision support issues. Discussion paper program, Casualty Actuarial Society .Kimball, M.S. (1990) Precautionary saving in the small and in the large. Econometrica , ,53-73. 40omiya, H. (1988) Elementary proof for sion’s minimax theorem. Kodai Mathematical Journal , , 5-7.Markowitz, H. (1952) Portfolio selection. Journal of Finance , , 7791.Menezes, C., Geiss, C. and Tressler, J. (1980) Increasing downside risk. The American EconomicReview , , 921-932.Millo, G. (2016) The income elasticity of nonlife insurance: A reassessment. Journal of Risk andInsurance , , 335-362.Mossin, J. (1968) Aspects of rational insurance purchasing. Journal of Political Economy , ,553-568.Noussair, C.N., Trautmann, S.T. and Van de Kuilen, G. (2014) Higher order risk attitudes, demo-graphics, and financial decisions. The Review of Economic Studies , , 325-355.Ohlin, J. (1969) On a class of measures of dispersion with application to optimal reinsurance. ASTIN Bulletin: The Journal of the IAA , , 249-266.Picard, P. (2000) On the design of optimal insurance policies under manipulation of audit cost. International Economic Review , , 1049-1071.Pratt, J.W. (1964) Risk aversion in the small and in the large. Econometrica , , 122-136.Raviv, A. (1979) The design of an optimal insurance policy. American Economic Review , ,84-96.Rothschild, M. and Stiglitz, J. (1976) Equilibrium in competitive insurance markets: An essay onthe economics of imperfect information. The Quarterly Journal of Economics , , 629-649Schlesinger, H. (1981) The optimal level of deductibility in insurance contracts. Journal of risk andinsurance , , 465-481.Teh, T.L. (2017) Insurance design in the presence of safety nets. Journal of Public Economics , ,47-58.Vajda, S. (1962) Minimum variance reinsurance. ASTIN Bulletin: The Journal of the IAA , ,257-260. 41iscusi, W.K. (1979) Insurance and individual incentives in adaptive contexts. Econometrica , , 1195-1207.Whitmore, G.A. (1970) Third-degree stochastic dominance. The American Economic Review , , 457-459.Xu, Z.Q., Zhou, X.Y. and Zhuang, S.C. (2019) Optimal insurance with rank-dependent utility andincentive compatibility. Mathematical Finance , , 659-692.Zhou, C. and Wu, C. (2008) Optimal insurance under the insurer’s risk constraint. Insurance:Mathematics and Economics , , 992-999.Zhou, C., Wu, W. and Wu, C. (2010) Optimal insurance in the presence of insurer’s loss limit. Insurance: Mathematics and Economics ,46(2)