Monotone additive statistics
MMonotone Additive Statistics
Xiaosheng Mu ∗ Luciano Pomatto † Philipp Strack ‡ Omer Tamuz § February 2, 2021
Abstract
We study statistics : mappings from distributions to real numbers. We characterizeall statistics that are monotone with respect to first-order stochastic dominance,and additive for sums of independent random variables. We explore a number ofapplications, including a representation of stationary, monotone time preferences,generalizing Fishburn and Rubinstein (1982) to time lotteries.
How should a random quantity be summarized by a single number? In Bayesian statistics,point estimators capture an entire posterior distribution. In finance, risk measures quantifythe risk in a distribution of returns. And in economics, certainty equivalents characterizean expected utility agent’s preference for uncertain outcomes.We use the term statistic to describe a map that assigns a number to each real-valued random variable, with the basic requirement that this number depends only on thedistribution of the random variable. We study statistics that are monotone with respectto first-order stochastic dominance, and additive for sums of independent random variables.An example of a monotone additive statistic is the expectation. The median is monotonebut not additive, while the variance is additive but not monotone.Monotonicity is a well studied property of statistics (see, e.g., Bickel and Lehmann,1975a,b), and holds, for example, for certainty equivalents of monotone preferences. Ad-ditivity is a stronger assumption. We focus on this property because of its conceptualsimplicity and because it serves as a baseline assumption in many settings. In particular, ∗ Princeton University. Email: [email protected]. † Caltech. Email: [email protected]. ‡ Yale University. Email: [email protected]. Philipp Strack was supported by a Sloan fellowship. § Caltech. Email: [email protected]. Omer Tamuz was supported by a grant from the SimonsFoundation ( The term “descriptive statistic” usually refers to maps associating a number to observations or toempirical distributions. Because of its simplicity, we apply it here to general distributions. See e.g. Bickeland Lehmann (1975a). a r X i v : . [ ec on . T H ] F e b e show below that additivity corresponds to a form of stationarity in the context ofpreferences over time lotteries.Beyond the expectation, an additional example of a monotone additive statistic is themap K a that, given a ∈ R , assigns to each random variable X the value K a ( X ) = 1 a log E h e aX i . (1)In the language of statistics, the map is the (normalized) cumulant generating functionevaluated at a . In economics, it corresponds to the certainty equcivalent defined by aCARA preference over gambles. For bounded random variables, the essential minimumand maximum provide further examples of such statistics; as we explain later, they are thelimit of K a ( X ) as a approaches ±∞ .Our main result is that these examples, and their weighted averages, are the onlymonotone additive statistics. That is, we show that every monotone additive statistic Φ isof the form Φ( X ) = Z K a ( X ) d µ ( a )for some probability measure µ . This result provides a simple representation of a naturalfamily of statistics, which one may a priori have expected to be much richer.Our first application is to time lotteries. The starting point for our analysis is the workby Fishburn and Rubinstein (1982), who study preferences over dated rewards: a monetaryamount, together with the time at which it will be received. They show that exponentialdiscounting of time arises from a set of axioms, of which the most substantial axiom is stationarity : preferences remain invariant when the dated rewards under consideration areshifted by the same amount of time.We extend the setting of Fishburn and Rubinstein (1982) to that of time lotteries :a monetary amount, together with a random time at which it will be received. In thissetting, we also introduce a stationarity axiom that requires preferences to be invariantwith respect to random shifts in time. As we argue in the main text, this stationarityaxiom captures a basic requirement of dynamic consistency.We show that stationarity, together with a monotonicity and a continuity axiom, implythat the preference admits the representation u ( x ) · e − r R K a ( T ) d µ ( a ) , for each time lottery that delivers a monetary reward x at a random time T . Overdeterministic dated rewards, the representation coincides with the one of Fishburn andRubinstein (1982). General time lotteries are reduced to deterministic ones by a monotoneadditive statistic that maps the random time T to the value R K a ( T ) d µ ( a ). For eachparameter a , the term K a ( T ) is the certainty equivalent of T under an expected discounted2reference with discount factor a . The different certainty equivalents are thus averagedaccording to the measure µ .Our representation of these monotone and stationary time preferences has implicationsfor the understanding of risk attitudes toward time. Risk preferences over time lotterieshave been studied both theoretically and experimentally (Chesson and Viscusi, 2003; Onayand Öncüler, 2007; Ebert, 2020; DeJarnette et al., 2020). A basic paradox these papershighlight is that most subjects display risk aversion over the time dimension, even thoughthe standard theory of expected utility with exponential discounting predicts that people arerisk-seeking with respect to time lotteries. Our analysis shows that expected exponentiallydiscounted utility is only one of many ways to extend exponential discounting (from datedrewards to time lotteries) while maintaining stationarity. In fact, we characterize a class ofstationary preferences over time lotteries that exhibit risk aversion over time.Our second application is to the domain of monetary gambles. In this domain, it iswell known that expected utility agents whose preferences are invariant to background riskmust have CARA preferences. Our main characterization theorem implies that beyondexpected utility, such agents have certainty equivalents that are weighted averages of CARAcertainty equivalents. We similarly extend a result of Rabin and Weizsäcker (2009) from theexpected utility domain to general monotone preferences. They show that among expectedutility maximizers, only CARA agents do not violate stochastic dominance for combinedrisks. We show that a monotone preference has this property only if it is representedby a monotone additive statistic, i.e., it is represented by an average of CARA certaintyequivalents. Bickel and Lehmann (1975a,b) study location statistics using a similar axiomatic, non-parametric approach, and also consider the monotonicity property that we impose, butnot additivity. In contrast, the mathematics literature has studied additive statistics, ashomomorphisms from the convolution semigroup to the reals (see Ruzsa and Székely, 1988;Mattner, 1999, 2004), without imposing monotonicity.In the finance and actuarial sciences literature, the CARA certainty equivalent − K − a ( X )shows up and is often called the entropic risk measure of X with parameter a (see Föllmerand Schied, 2011). Goovaerts, Kaas, Laeven, and Tang (2004) prove a result that issimilar to our Theorem 1, under the stronger assumption that K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R implies Φ( X ) ≥ Φ( Y ). Our monotonicity property only demands this to hold when X ≥ Y . In an earlier paper, Pomatto, Strack, and Tamuz (2020) show that on the larger domainof random variables that have all moments, the only monotone additive statistic is the Note that X ≥ Y implies but is not implied by K a ( X ) ≥ K a ( Y ) for all a . a = 0, themonotone additive statistic K a ( X ) that we identify takes infinite value for some unboundedrandom variables that have all moments. Fritz, Mu, and Tamuz (2020) show that theexpectation remains the unique monotone additive statistic on the even larger domain of L p random variables, for any p ≥
1. They additionally show that there are no monotoneadditive statistics on L p with p <
1, or on the domain of all random variables, where theexpectation may not exist.A strengthening of our additivity condition is the requirement of additivity for all pairsof random variables, rather than just the independent ones. This stronger assumption turnsout to be restrictive: The only statistic that satisfies additivity for all random variables isthe expectation (see de Finetti, 1970).As is well known, directly averaging exponential discounting utilities leads to present bias(see Jackson and Yariv, 2020). This phenomenon gives rise to impossibility results regardingthe aggregation of stationary individual preferences into a stationary social preference. Ourcontribution to this literature is to observe that beyond the expected utility framework,aggregating the certainty equivalents of exponentially discounted preferences can maintainstationarity.Monotone additive statistics also relate to what we called additive divergences in aprevious paper (Mu, Pomatto, Strack, and Tamuz, 2021). The domain of an additivedivergence consists of Blackwell experiments. It satisfies monotonicity with respect to theBlackwell order and additivity for product experiments. Our characterization of additivedivergences in that paper is reminiscent of the one we provide here for monotone additivestatistics, with Rényi divergences playing the role of the certainty equivalents K a in thecurrent work.The remainder of the paper is organized as follows. In §2 we introduce monotoneadditive statistics, state our main result and provide an outline of its proof. In §3 weapply this result to time lotteries, and in §4 we apply it to monetary gambles. Theappendix contains omitted proofs, as well as a study of monotone sub-additive statistics,i.e., statistics which satisfy Φ( X + Y ) ≤ Φ( X ) + Φ( Y ) for independent X and Y . Let (Ω , F , P ) be a nonatomic probability space. We denote by L ∞ the collection of boundedreal random variables on this space. By a standard abuse of notation we will identify theconstant c ∈ R with the constant random variable X ( ω ) = c . Given X ∈ L ∞ , max[ X ] andmin[ X ] denote its the essential maximum and minimum.We say that a map Φ : L ∞ → R is a statistic if (i) whenever X, Y ∈ L ∞ have the same4istribution, Φ( X ) = Φ( Y ), and (ii) Φ( c ) = c for every c ∈ R ; that is, Φ assigns c to theconstant random variable c . Condition (ii) may appear restrictive, but it amounts to asimple normalization when combined with the monotonicity and additivity assumptions wemake below. We are interested in statistics that satisfy two properties: monotonicity withrespect to first-order stochastic dominance, and additivity for sums of independent randomvariables. A statistic Φ : L ∞ → R is additive if Φ( X + Y ) = Φ( X ) + Φ( Y ) whenever X and Y are independent random variables. It is monotone if X ≥ Y almost surely impliesΦ( X ) ≥ Φ( Y ). It is monotone with respect to first-order stochastic dominance if X ≥ Y implies Φ( X ) ≥ Φ( Y ), where X ≥ Y denotes first-order stochastic dominance.Since we assume the statistic depends only on the distribution, monotonicity withrespect to first-order stochastic dominance is equivalent to monotonicity. This equivalenceis based on the well-known characterization that X ≥ Y if and only if there are randomvariables ˜ X, ˜ Y such that X and ˜ X are identically distributed, Y and ˜ Y are identicallydistributed, and ˜ X ≥ ˜ Y almost surely. Henceforth when we say Φ is monotone, we meanthat it is monotone with respect to first-order stochastic dominance.Let ¯ R = R ∪ {−∞ , ∞} denote the two point compactification of R . Given X ∈ L ∞ and a ∈ ¯ R \ { , ±∞} , let K a ( X ) = 1 a log E h e aX i . (2)This is the (normalized) cumulant generating function of X , evaluated at a . We additionallydefine K ( X ) , K ∞ ( X ) , K −∞ ( X ) to be the expectation, essential maximum and essentialminimum of X , respectively; this makes a K a ( X ) a continuous function from ¯ R to R .It is easy to check that each K a is a monotone additive statistic. Our main result is thatthese statistics—together with their weighted averages—constitute all of the monotoneadditive statistics. Theorem 1.
Φ : L ∞ → R is a monotone additive statistic if and only if there exists aBorel probability measure µ on ¯ R , such that for every X ∈ L ∞ Φ( X ) = Z ¯ R K a ( X ) d µ ( a ) . (3) Moreover, the measure µ is unique. Theorem 1 holds for other domains of random variables. Denote by L ∞ + the boundednon-negative random variables, by L ∞ N the bounded non-negative integer-valued random Under monotonicity and additivity, any Φ that satisfies (i) and is not identically zero must haveΦ(1) = 0, and furthermore Φ( X ) / Φ(1) is a monotone additive statistic that satisfies (ii). An alternative, equivalent definition is to let the domain of Φ be the set of distributions of the randomvariables in L ∞ . In this domain, additivity would be defined with respect to convolution. We choose tohave the domain consist of random variables, as this approach offers some notational advantages. L M the random variables X for which K a ( X ) is finite for all a ∈ R . Thecollection L M contains, in addition to all the bounded random variables, those unboundedones whose distribution has “sub-exponential” tails, such as the normal distribution. Theorem 2.
Let L be either L ∞ + , L ∞ N or L M . Then Φ : L → R is a monotone additivestatistic if and only if it admits a (unique) representation of the form (3) . In the case of L M , the measure µ has to be compactly supported on R . To prove Theorem 2 for the cases of L = L ∞ + and L = L ∞ N , we show that any monotoneadditive statistic defined on these smaller domains can be extended to one on L ∞ , andthen invoke Theorem 1. The case of the larger domain L M turns out to be more difficult,and the proof requires some additional ideas that are explained in the appendix. Our approach to the proof of Theorem 1 is via the catalytic stochastic order . Given
X, Y ∈ L ∞ , we say that X dominates Y in the catalytic stochastic order on L ∞ if thereexists a Z ∈ L ∞ , independent of X and Y , such that X + Z ≥ Y + Z (i.e., X + Z stochastically dominates Y + Z ).The applicability of this order to our problem is immediate: if Φ is monotone andadditive, then whenever X dominates Y in the catalytic stochastic order it holds thatΦ( X ) ≥ Φ( Y ). To see this, note that domination in the catalytic order implies thatΦ( X + Z ) ≥ Φ( Y + Z )for some Z ∈ L ∞ , since Φ is monotone. Additivity of Φ implies that Φ( X + Z ) =Φ( X ) + Φ( Z ) and Φ( Y + Z ) = Φ( Y ) + Φ( Z ), and so we have that Φ( X ) ≥ Φ( Y ).Clearly, if X ≥ Y then X also dominates Y in the catalytic order, as one can take Z = 0 (or in fact any Z ). A priori, one may conjecture that this is also a necessarycondition. As we show, this is far from true.Figure 1 gives a simple example of X, Y ∈ L ∞ that are not ranked with respect tofirst-order stochastic dominance, but are ranked with respect to the catalytic order. X isBernoulli, and equals 1 with probability 1 / Y has the uniform distribution on [ − , ].As the figure shows, their c.d.f.s are not ranked, and hence they are not ranked in terms offirst-order stochastic dominance. However, if we let Z assign probability half to ± , then X + Z > Y + Z . Intuitively,since the c.d.f. of X + Z is the average of the two translations (by ± ) of the c.d.f. of X , We are indebted to the late Kim Border for helping us construct this example. Pomatto et al. (2020) give examples of random variables X and Y that are not ranked in stochasticdominance, but are ranked after adding an unbounded independent Z . In fact, they show that this ispossible whenever E [ X ] > E [ Y ]. As we explain below, this result no longer holds when Z is required to bebounded. Figure 1: The c.d.f.s of X (blue) and Y (orange). - Figure 2: The c.d.f.s of X + Z (blue) and Y + Z (orange).and since the same holds for the c.d.f. of Y , the result of adding Z is the disappearance ofthe small “kink” in which the ranking of the c.d.f.s is reversed. This is depicted in Figure 2.Every monotone additive statistic provides an obstruction to dominance in the catalyticorder. That is, if Φ( X ) < Φ( Y ) for some monotone additive statistic Φ, then it isimpossible that X + Z ≥ Y + Z for some independent Z , since monotonicity wouldimply that Φ( X + Z ) ≥ Φ( Y + Z ), and additivity would then imply that Φ( X ) ≥ Φ( Y ).This observation applies in particular to the monotone additive statistics K a , so that K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R is necessary for there to exist some Z that makes X stochastically dominate Y . In fact, except for the trivial case where X and Y have the same distribution, it is necessary to havethe strict inequality K a ( X ) > K a ( Y ) for all a ∈ ¯ R \{±∞} . This is because X + Z ≥ Y + Z implies thestrict inequality K a ( X + Z ) > K a ( Y + Z ) whenever X + Z and Y + Z have different distributions. Thus,a corollary of Theorem 3 below is that for distributions with different minima and maxima, the condition K a are, in a sense, the only obstructions. This constitutes the most important component of the proof of Theorem 1.
Theorem 3.
Let
X, Y ∈ L ∞ satisfy K a ( X ) > K a ( Y ) for all a ∈ ¯ R . Then there exists anindependent Z ∈ L ∞ such that X + Z ≥ Y + Z . To prove Theorem 3 we explicitly construct Z as a truncated Gaussian with appropri-ately chosen parameters. The idea behind the proof is the following. Denote by F and G the c.d.f.s of X and Y , respectively, and suppose that they are supported on [ − N, N ]. Let h ( x ) = √ πV e − x V be the density of a Gaussian Z . Then the c.d.f.s of X + Z and Y + Z are given by the convolutions F ∗ h and G ∗ h , and their difference is equal to[ G ∗ h − F ∗ h ]( y ) = Z N − N [ G ( x ) − F ( x )] · h ( y − x ) d x = 1 √ πV e − y V · Z N − N [ G ( x ) − F ( x )] · e yV · x | {z } ( ∗ ) · e − x V | {z } ( ∗∗ ) d x If we denote a = yV , then by integration by parts, the integral of just ( ∗ ) is equal to a (cid:16) E h e aX i − E h e aY i(cid:17) , which is positive by the assumption that K a ( X ) > K a ( Y ) andis in fact bounded away from zero. The term ( ∗∗ ) can be made arbitrarily close to1—uniformly on the intergral domain [ − N, N ]—by making V large. This implies that[ G ∗ h − F ∗ h ]( y ) ≥ y , and we further show that the inequality still holds if wemodify Z by truncating its tails, ensuring that it is in L ∞ .Theorem 3 allows us to prove the following key lemma: Lemma 1.
Let
Φ : L ∞ → R be a monotone additive statistic. If K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R then Φ( X ) ≥ Φ( Y ) .Proof. Suppose K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R . For any ε >
0, consider ˆ X = X + ε . Then K a ( ˆ X ) = K a ( X ) + ε > K a ( Y ) for all a , and by Theorem 3 there is an independent Z ∈ L ∞ such that ˆ X + Z ≥ Y + Z . Hence, by monotonicity of Φ, Φ( ˆ X + Z ) ≥ Φ( Y + Z ), andby additivity Φ( ˆ X ) ≥ Φ( Y ). This means that Φ( X ) + ε ≥ Φ( Y ) for all ε >
0, and henceΦ( X ) ≥ Φ( Y ).Once we have established Lemma 1, the remainder of the proof of Theorem 1 usesfunctional analysis techniques (in particular the Riesz Representation Theorem) to deducethe integral representation for monotone additive statistics. K a ( X ) > K a ( Y ) for all a ∈ ¯ R is necessary and sufficient for dominance in the catalytic order on L ∞ . A similar result to Theorem 3 holds if we demand a weaker conclusion that X + Z second-order stochastically dominates Y + Z . See Proposition 5 in the appendix. Monotone Stationary Time Preferences
We model a time lottery by a pair ( x, T ), which consists of a non-negative payoff x ∈ R + and a bounded non-negative random time T ∈ L ∞ + at which this payoff realizes. Thustime is non-negative and continuous in this section. Our primitive is a weak order (cid:23) onthe domain R + × L ∞ + . We denote by ∼ the indifference relation induced by (cid:23) . To avoidnotational confusion, in the rest of this section x and y always denote monetary payoffs, t , s and d always denote deterministic times, and capitalized letters T, S, D, R ∈ L ∞ + alwaysdenote random times.We impose the following four axioms on (cid:23) : Axiom 3.1 (More is Better) . If x > y ≥ then ( x, T ) (cid:31) ( y, T ) for all T ∈ L ∞ + . Axiom 3.2 (Earlier is Better) . If S ≥ T in first-order stochastic dominance, then ( x, T ) (cid:23) ( x, S ) for all x ≥ . Indifference obtains if x = 0 , and strict preference obtains if x > and S > T are deterministic times.
Axiom 3.3 (Stationarity) . If ( x, T ) (cid:23) ( y, S ) then ( x, T + D ) (cid:23) ( y, S + D ) for any D ∈ L ∞ + that is independent from T and S . Axiom 3.4 (Continuity) . For any ( y, S ) , the sets { ( x, t ) : ( x, t ) (cid:23) ( y, S ) } and { ( x, t ) :( x, t ) (cid:22) ( y, S ) } are closed in the product topology on R + × R + . The first two axioms and the continuity axiom are standard conditions that directlygeneralize the axioms in Fishburn and Rubinstein (1982). Axioms 3.1 and 3.2 requirethe decision maker to prefer higher payoffs, and to prefer (stochastically) earlier times.The continuity assumption is similarly standard. Note that it does not require a choice oftopology for L ∞ + , the set of random times.The most substantive condition is stationarity. In the absence of risk, it was shown byHalevy (2015) that stationarity can be understood as the implication of two more basicprinciples: that preferences are not affected by calendar time, and that the decision makeris dynamically consistent. We now argue that Axiom 3.3 extends the same logic to timelotteries.First, suppose that the time D , which is a delay added to both S and T , is deterministic.Reasoning as in Halevy (2015), we can consider an enlarged framework where the decisionmaker is endowed with a profile ( (cid:23) t ) of preferences over time lotteries, with (cid:23) t representingthe preference the decision maker expresses at time t .If preferences are not affected by calendar time, then the ranking ( x, T ) (cid:23) ( y, S ) attime zero must imply the same ranking ( x, T + d ) (cid:23) d ( y, S + d ) at time d . Moreover,9ynamic consistency requires that a choice between ( x, T + d ) and ( x, T + d ), when evaluatedat time zero, must be the same choice the decision maker would in fact make at time d .Hence, ( x, T ) (cid:23) ( y, S ) implies ( x, T + d ) (cid:23) ( y, S + d ), as required by Axiom 3.3.Suppose now that the delay D is random and independent of S and T . By the abovereasoning, dynamic consistency and time invariance imply the ranking ( x, T + d ) (cid:23) ( y, S + d )for each deterministic time d . We can then imagine that prior to making a choice between( x, T + D ) and ( y, S + D ), the decision maker is informed of the actual realization d of D .Regardless of what the value d is, this information should not change the decision maker’spreference of ( x, T + d ) over ( y, S + d ), since D is independent of T and S . So, dynamicconsistency with respect to this piece of information requires the decision maker to prefer( x, T + D ) to ( y, S + D ). While this latter form of dynamic consistency is suggestive ofexpected utility, we will in fact derive non-expected utility representations that also satisfythis consistency condition. We say that a preferences (cid:23) on R + × L ∞ + is a monotone stationary preference if it satisfiesAxioms 3.1, 3.2, 3.3 and 3.4. We say that (cid:23) is represented by f : R + × L ∞ + → R if( x, T ) (cid:23) ( y, S ) if and only if f ( x, T ) ≥ f ( y, S ) . Our main result in this section is stated as follows:
Theorem 4.
A preference (cid:23) on R + × L ∞ + is a monotone stationary preference if and onlyif there exists a monotone additive statistic Φ , an r > , and a continuous and strictlyincreasing utility function u : R + → R + with u (0) = 0 , such that (cid:23) is represented by f ( x, T ) = u ( x ) · e − r Φ( T ) . (4)The coefficient r represents the discount factor the decision maker applies whenevaluating riskless date rewards. As in Fishburn and Rubinstein (1982), it can be chosenarbitrarily by a suitable normalization of u .By Theorem 2, we can conclude that every monotone stationary preference has arepresentation of the form f ( x, T ) = u ( x ) · e − r R K a ( T ) d µ ( a ) . (5)The result can be interpreted as saying that the decision maker evaluates the pair ( x, T ) ina multiplicatively separable way, by discounting the utility from x by an appropriate factorthat depends only on T . The discount factor can be expressed in terms of the certaintyequivalent of T , which we denote by Φ( T ). Furthermore, Φ is a monotone additive statistic,and so its form is pinned down by Theorem 2. This form (5) implies that the random time T is evaluated as the average of certainty equivalents of different exponential discounters.10egardless of the particular statistic Φ that enters the representation, the preference (cid:23) when restricted to deterministic dated rewards is represented by exponentially discountedutility. However, Theorem 4 demonstrates there are many ways to extend exponentiallydiscounted utility to the larger domain of time lotteries, while maintaining stationarity.We recover expected exponentially discounted utility if Φ( T ) = − r log E h e − rT i , since inthat case f ( x, T ) = u ( x ) · E h e − rT i . As is well known, such preferences are risk-seeking over time .But any monotone additive statistic Φ gives rise to a stationary time preference viathe utility representation in Theorem 4, and such a preference need not be either EUor risk-seeking. As an example, if Φ( T ) = K r ( T ) = r log E h e rT i then we get a newrepresentation f ( x, T ) = u ( x ) E [e rT ] , which is in fact risk-averse over time . In a later subsection we characterize the preciseconditions on the measure µ such that the resulting time preference is risk-seeking, orrisk-averse.The idea behind the proof of Theorem 4 is as follows. For fixed x , the continuity axiomensures that there is a certainty equivalent function Φ x such that ( x, T ) ∼ ( x, Φ x ( T ))for all T ∈ L ∞ + . The monotonicity of Φ x is a simple consequence of the first axiom.To see that Φ x is additive, we apply stationarity twice. First, stationarity implies that( x, Φ x ( T ) + Φ x ( S )) ∼ ( x, T + Φ x ( S )), with the constant Φ x ( S ) playing the role of D .Likewise, stationarity also implies that ( x, Φ x ( S ) + T ) ∼ ( x, S + T ), where now T playsthe role of D . Put together, these imply that ( x, Φ x ( T ) + Φ x ( S )) ∼ ( x, Φ x ( T + S )), andso Φ x ( T + S ) = Φ x ( T ) + Φ x ( S ). A third application of the stationarity axiom yieldsthat Φ x = Φ y for every x, y >
0, which allows us to write Φ instead of Φ x . Finally,the representation u ( x )e − r Φ( T ) follows by applying the original result of Fishburn andRubinstein (1982).This proof, and the representation in Theorem 4, can be extended to a discrete-timesetting. However, one difficulty that arises is that a discrete time lottery need not have acertainty equivalent that is an integer time. Because of this, we need additional work toreduce each time lottery to a deterministic dated reward in order to apply the result ofFishburn and Rubinstein (1982). See Appendix D.2 for details. As we mentioned in previous discussion, our representation f ( x, T ) = u ( x ) · e − r Φ( T ) isin general non-expected utility. In this subsection we study the extent to which thisrepresentation violates the behavioral assumptions of EU. We begin by investigating the11 etweenness axiom of Dekel (1986), which relaxes expected utility theory by requiring thatindifference curves are straight lines (but need not be parallel).We use the notation X λ Y to denote a random variable that is equal to X with probability λ ∈ [0 ,
1] and equal to Y with probability 1 − λ . Equivalently, if the distribution of X is µ and the distribution of Y is ν , then the distribution of X λ Y is λµ + (1 − λ ) ν . Axiom 3.5 (Betweenness) . ( x, T ) ∼ ( x, S ) implies ( x, T λ S ) ∼ ( x, S ) for all λ ∈ (0 , . The next result characterizes monotone stationary preferences that have this property.
Proposition 1.
A monotone stationary preference with representation f ( x, T ) = u ( x )e − r Φ( T ) satisfies Axiom 3.5 if and only if1. Φ( T ) = K a ( T ) for some a ∈ ¯ R , or2. Φ( T ) = β min[ T ] + (1 − β ) max[ T ] for some β ∈ (0 , , or3. Φ( T ) = − a a − a K a ( T ) + a a − a K a ( T ) for some a ∈ ( −∞ , and a ∈ (0 , ∞ ) . In fact, our proof shows that Proposition 1 holds under a weaker form of betweenness:( x, T ) ∼ ( x, t ) implies ( x, T λ t ) ∼ ( x, t ). That is, it suffices to require betweenness whenmixing with constants.Next, we study the classic independence axiom underlying expected utility theory. Axiom 3.6 (Independence) . ( x, T ) ∼ ( x, S ) implies ( x, T λ R ) ∼ ( x, S λ R ) . The space of time lotteries is not a mixture space, so we only impose independence forrandom times associated with the same monetary reward. We do not impose continuitybeyond Axiom 3.4.The following result characterizes monotone stationary preferences that additionallysatisfy independence:
Proposition 2.
A monotone stationary preference with representation f ( x, T ) = u ( x )e − r Φ( T ) satisfies Axiom 3.6 if and only if Φ( T ) = K a ( T ) for some a ∈ ¯ R . This proposition implies that such a preference has one of the following representations: u ( x ) · e − r min[ T ] , u ( x ) · E h e − rT i , u ( x ) · e − r E [ T ] , u ( x ) E [e rT ] , u ( x ) · e − r max[ T ] The first and last representations correspond to the most extreme forms of risk-seekingand risk-averse time preferences, respectively. Because these extreme preferences do notsatisfy “mixture continuity”, it is not possible to deduce Proposition 2 directly from thevon Neumann Morgenstern theorem. Our proof instead invokes Proposition 1, and usesthe independence axiom to further pin down the form of Φ. Of course, there are many such random variables, but for our purposes this will not be important. .5 Risk attitudes toward time We have shown that monotone stationary preferences over time lotteries admit a represen-tation of the form u ( x ) · e − r R K a ( T ) d µ ( a ) for some probability measure µ on ¯ R . In this section we study which measures µ give riseto risk-averse or risk-seeking behavior toward time. For example, we have seen that when µ is a point mass on a , the preference will be risk-averse or risk-seeking depending onwhether a is positive or negative.Formally, we say that a preference (cid:23) over time lotteries exhibits risk aversion if( x, E [ T ]) (cid:23) ( x, T ) for every x ∈ R + and T ∈ L ∞ + . If the reverse preference always holds,then (cid:23) is risk-seeking. The following result generalizes our previous observations regardingpoint mass measures µ : Proposition 3.
A monotone stationary preference with representation f ( x, T ) = u ( x )e − r Φ( T ) is risk-averse (respectively risk-loving) over time if and only if Φ( T ) = Z ¯ R K a ( T ) d µ ( a ) for a Borel probability measure µ supported on [0 , ∞ ] (respectively [ −∞ , ). Thus, risk aversion over time occurs if and only if the decision maker aggregates thecertainty equivalents of exponentially discounting EU agents with discount factors greaterthan or equal to 1. Likewise, risk seeking occurs if and only if the relevant discount factorsin the aggregation are all less than or equal to 1.More generally, we can compare the risk attitudes of two different monotone stationarypreferences. Consider two preferences represented by u ( x )e − r Φ µ ( T ) and u ( x )e − r Φ ν ( T ) , whereΦ µ and Φ ν are two different monotone additive statistics with corresponding measures µ and ν . We say that the preference represented by Φ µ is more risk-averse than thepreference represented by Φ ν if Φ µ ( T ) ≥ Φ ν ( T ) for every T ∈ L ∞ + . In words, we require theformer preference to assign a worse certainty equivalent (i.e., later time) to every randomtime T . As a corollary of the analysis in Proposition 3, we know that a statistic Φ is additive and monotone withrespect to second-order (or any higher-order) stochastic dominance if and only if Φ( X ) = R ¯ R K a ( X ) d µ ( a ) fora probability measure µ supported on [ −∞ , X ) ≤ E [ X ]for all X (which is risk-seeking in time) only if the measure µ associated with Φ is supported on [ −∞ , X ) ≤ Φ( E [ X ]) = E [ X ] is a necessary condition for monotonicity with respect to second-orderstochastic dominance, µ being supported on [ −∞ ,
0] is also necessary. Conversely, note that for any a ≤ − e aX is increasing and has alternating derivatives of all orders. So whenever X dominates Y in second-order (or any higher-order) stochastic dominance, it holds that E (cid:2) − e aX (cid:3) ≥ E (cid:2) − e aY (cid:3) . Fromthis we obtain K a ( X ) = a E (cid:2) e aX (cid:3) ≥ K a ( Y ) for any a ≤
0, and thus Φ( X ) = R K a ( X ) d µ ( a ) is larger thanΦ( Y ) whenever µ is supported on [ −∞ , µ and ν is the first preference more risk-averse than thesecond? That is, when is it the case that Φ µ ( T ) ≥ Φ ν ( T ) for all T ? Since K a ( T ) increasesin a , first-order stochastic dominance µ ≥ ν is clearly sufficient, but—as we show—it isnot necessary. We provide an exact characterization in the following result.
Proposition 4.
For any two probability measures µ and ν on ¯ R , the inequality Z ¯ R K a ( T ) d µ ( a ) ≥ Z ¯ R K a ( T ) d ν ( a ) holds for every T ∈ L ∞ + if and only if the following two conditions hold:(i) For every b > , R [ b, ∞ ] a − ba d µ ( a ) ≥ R [ b, ∞ ] a − ba d ν ( a ) .(ii) For every b < , R [ −∞ ,b ] a − ba d µ ( a ) ≤ R [ −∞ ,b ] a − ba d ν ( a ) . This result can be seen as a generalization of the previous Proposition 3, since apreference exhibits risk aversion (respectively risk seeking) if and only if it is more (respec-tively less) risk-averse than the risk neutral preference represented by u ( x )e − r E [ T ] . In theappendix, we explain how to deduce Proposition 3 from Proposition 4. In this section we consider bounded monetary gambles, and study preferences over thesegambles, which we denote by L ∞ . As above, we assume that agents’ preferences for gamblesdepend only on their distribution. Consider an expected utility agent who evaluates a gamble X according to a certaintyequivalent Ψ( X ) = u − E [ u ( X )] for some increasing utility function u . As is well known,the assumption that the agent’s preferences are not affected by independent backgroundrisk implies that the agent has CARA preferences. Formally, ifΨ( X ) ≤ Ψ( Y ) implies Ψ( X + Z ) ≤ Ψ( Y + Z ) for all independent Z (6)then Ψ( X ) = K a ( X ) for some a ∈ R .A natural question is: how does this result extend beyond expected utility theory?That is, which preferences on L ∞ are monotone with respect to first-order stochasticdominance, and have a certainty equivalent Φ that satisfies (6)? The answer is that sucha certainty equivalent Φ must be a monotone additive statistic. To see this, note that An example is µ = δ + δ , whereas ν = δ . Condition (ii) in Proposition 4 is trivially satisfied,whereas condition (i) reduces to (1 − b ) + + (3 − b ) + ≥ (2 − b ) + , which holds because the function( a − b ) + = max { a − b, } is convex in a . X ) = Φ(Φ( X )) since the certainty equivalent to the constant Φ( X ) is itself. Thus by(6), we have Φ( X + Y ) = Φ(Φ( X ) + Y ) , with Y playing the role of Z there. Likewise, since Φ( Y ) = Φ(Φ( Y )), (6) givesΦ( Y + Φ( X )) = Φ(Φ( Y ) + Φ( X ))where now the constant Φ( X ) takes the role of Z . Combining the above two equalitiesyields Φ( X + Y ) = Φ(Φ( Y ) + Φ( X )) = Φ( X ) + Φ( Y ) , so Φ is additive.Given this, Theorem 1 implies that any monotone preference that is represented by acertainty equivalent and is invariant to background risk must have a representation of theform Φ( X ) = Z ¯ R K a ( X ) d µ ( a )for some measure µ on ¯ R . That is, the certainty equivalent Φ is a weighted average of thecertainty equivalents of CARA agents. Rabin and Weizsäcker (2009) show that for any non-CARA expected utility decision maker,one can construct two pairs of bounded gambles X , Y and X , Y , such that X is chosenover Y , X is chosen over Y , but the independent sum X + X is first-order stochasticallydominated by Y + Y . This result suggests that for “most” EU agents, choosing betweenrisky aspects in isolation can lead to stochastically dominated combined choices. In thissection we study the extent to which their insight generalizes to non-EU preferences.Accordingly, our primitive here is a weak order (cid:23) on L ∞ , the space of bounded gambles.As is standard, we write (cid:31) for the strict part of (cid:23) , and ∼ for the induced indifferencerelation. We consider the following axioms on (cid:23) : Axiom 4.1 (Rabin and Weizsäcker) . Suppose X , X are independent and Y , Y areindependent. If X (cid:31) Y and X (cid:31) Y , then X + X < Y + Y . Axiom 4.2 (Responsiveness) . X + ε (cid:31) X for any X and any ε > . In Rabin and Weizsäcker (2009) the constructed gambles have binary support. Since we seek to analyzeall non-EU preferences, we will allow for bounded gambles. xiom 4.3 (Archimedeanity) . If c + ε (cid:31) X (cid:31) c − ε for some constant c and all ε > ,then X ∼ c . Theorem 5.
A preference (cid:23) on L ∞ satisfies Axioms 4.1, 4.2 and 4.3 if and only if it canbe represented by a monotone additive statistic Φ (i.e., X (cid:23) Y if and only if Φ( X ) ≥ Φ( Y ) ). We make a technical remark that for this result to hold, the responsiveness axiomcannot be dropped in general. An example is where X (cid:23) Y if and only if max { E [ X ] , } ≥ max { E [ Y ] , } . This preference satisfies the Rabin and Weizsäcker axiom because X (cid:31) Y and X (cid:31) Y imply E [ X ] > E [ Y ] and E [ X ] > E [ Y ]. So E [ X + X ] > E [ Y + Y ] and X + X cannot be stochastically dominated by Y + Y . Archimedeanity is also satisfied,but responsiveness fails.Archimedeanity (which plays the role of continuity) cannot be dropped either, sinceit helps ruling out lexicographic preferences. An example is where X (cid:23) Y if and only ifmax[ X ] > max[ Y ], or max[ X ] = max[ Y ] and min[ X ] ≥ min[ Y ]. This preference satisfiesthe Rabin and Weizsäcker axiom as well as responsiveness, but archimedeanity fails. References
C. D. Aliprantis and K. Border.
Infinite dimensional analysis: A hitchhiker’s guide .Springer, 2006.P. Bickel and E. Lehmann. Descriptive statistics for nonparametric models i. introduction.
Annals of Statistics , 3(5):1038–1044, 1975a.P. Bickel and E. Lehmann. Descriptive statistics for nonparametric models ii. location.
Annals of Statistics , 3(5):1045–1069, 1975b.S. Cerreia-Vioglio, D. Dillenberger, and P. Ortoleva. Cautious expected utility and thecertainty effect.
Econometrica , 83(2):693–728, 2015.H. W. Chesson and W. K. Viscusi. Commonalities in time and ambiguity aversion forlong-term risks.
Theory and Decision , 54(1):57–71, 2003.J. H. Curtiss. A note on the theory of moment generating functions.
The Annals ofMathematical Statistics , 13(4):430–433, 1942.B. de Finetti.
Theory of Probability . Wiley, 1970.P. DeJarnette, D. Dillenberger, D. Gottlieb, and P. Ortoleva. Time lotteries and stochasticimpatience.
Econometrica , 88(2):619–656, 2020.E. Dekel. An axiomatic characterization of preferences under uncertainty: Weakening theindependence axiom.
Journal of Economic Theory , 40:304–318, 1986.16. Ebert. Decision making when things are only a matter of time.
Operations Research ,2020.P. C. Fishburn and A. Rubinstein. Time preference.
International Economic Review , 23:677–694, 1982.H. Föllmer and A. Schied.
Stochastic finance: an introduction in discrete time . Walter deGruyter, 2011.T. Fritz, X. Mu, and O. Tamuz. Monotone homomorphisms on convolution semigroups,2020. Working Paper.M. J. Goovaerts, R. Kaas, R. J. Laeven, and Q. Tang. A comonotonic image of independencefor additive risk measures.
Insurance: Mathematics and Economics , 35(3):581–594, 2004.Y. Halevy. Time consistency: Stationarity and time invariance.
Econometrica , 83(1):335–352, 2015.M. O. Jackson and L. Yariv. The non-existence of representative agents. 2020. WorkingPaper.L. Kantorovich. On the moment problem for a finite interval. In
Dokl. Akad. Nauk SSSR ,volume 14, pages 531–537, 1937.L. Mattner. What are cumulants?
Documenta Mathematica , 4:601–622, 1999.L. Mattner. Cumulants are universal homomorphisms into hausdorff groups.
Probabilitytheory and related fields , 130(2):151–166, 2004.X. Mu, L. Pomatto, P. Strack, and O. Tamuz. From Blackwell dominance in large samplesto Rényi divergences and back again.
Econometrica , 89(1):475–506, 2021.S. Onay and A. Öncüler. Intertemporal choice under timing risk: An experimental approach.
Journal of Risk and Uncertainty , 34(2):99–121, 2007.L. Pomatto, P. Strack, and O. Tamuz. Stochastic dominance under independent noise.
Journal of Political Economy , 128(5):1877–1900, 2020.M. Rabin and G. Weizsäcker. Narrow bracketing and dominated choices.
AmericanEconomic Review , 99(4):1508–43, 2009.I. Ruzsa and G. J. Székely.
Algebraic probability theory . John Wiley & Sons Inc, 1988.17 ppendix
In the proofs we often use the notation K X ( a ) = K a ( X ) , so that K X is a map ¯ R → R . The following lemma is standard. Lemma 2.
Let
X, Y ∈ L ∞ .1. K X : ¯ R → R is well defined, non-decreasing and continuous.2. If K X = K Y then X and Y have the same distribution.Proof. See Curtiss (1942).
A Proof of Theorem 3
First, we can add the same constant b to both X and Y so that min[ Y + b ] = − N andmax[ X + b ] = N for some N >
0. Since translating both X and Y leaves the existence ofan appropriate Z unchanged (and also does not affect K X > K Y ), we henceforth assumewithout loss of generality that min[ Y ] = − N , and max[ X ] = N . Since K X > K Y , weknow that min[ X ] > − N and max[ Y ] < N .Denote the c.d.f.s of X and Y by F and G , respectively. Let σ ( x ) = G ( x ) − F ( x ).Note that σ is supported on [ − N, N ] and bounded in absolute value by 1. Moreover, bychoosing ε > X ] > − N + ε and max[ Y ] < N − ε . So σ ( x ) is positive on [ − N, − N + ε ] and on [ N − ε, N ]. In fact, there exists δ > σ ( x ) ≥ δ whenever x ∈ [ − N + ε , − N + ε ] and x ∈ [ N − ε , N − ε ]. We also fix a largeconstant A such that e εA ≥ Nεδ .
Define M σ ( a ) = Z N − N σ ( x )e ax d x. Note that for a = 0, M σ ( a ) = 1 a (cid:16) E h e aX i − E h e aY i(cid:17) , which follows from integration by parts, and that M σ (0) = E [ X ] − E [ Y ] . Therefore, since K X > K Y , we have that M σ is strictly positive everywhere. Since M σ ( a )is clearly continuous in a , it is in fact bounded away from zero on any compact interval.18e will use these properties of σ to construct a truncated Gaussian density h such that[ σ ∗ h ]( y ) = Z N − N σ ( x ) h ( y − x ) d x ≥ y ∈ R . If we let Z be a random variable independent from X and Y , whosedistribution has density function h , then σ ∗ h = ( G − F ) ∗ h is the difference between thec.d.f.s of Y + Z and X + Z . Thus [ σ ∗ h ]( y ) ≥ y would imply X + Z ≥ Y + Z .To do this, we write h ( x ) = e − x V for all | x | ≤ T , where V is the variance and T is thetruncation point to be chosen. We will show that [ σ ∗ h ]( y ) ≥ y whenever V is sufficiently large and T ≥ AV + N for the constants N and A defined above.First consider the case where y ∈ [ − AV, AV ]. In this region, | y − x | ≤ T is automaticallysatisfied when x ∈ [ − N, N ]. So we can compute the convolution σ ∗ h as follows: Z σ ( x ) h ( y − x ) d x = e − y V · Z N − N σ ( x ) · e yV · x · e − x V d x. (7)Note that yV in the exponent belongs to the compact interval [ − A, A ]. So for our fixedchoice of A , the integral M σ ( yV ) = R N − N σ ( x ) · e yV · x d x is uniformly bounded away from zerowhen y varies in the current region. Thus, Z N − N σ ( x ) · e yV · x · e − x V d x = M σ ( yV ) − Z N − N σ ( x ) · e yV · x · (1 − e − x V ) d x ≥ M σ ( yV ) − N · e AN · (1 − e − N V ) , (8)which is positive when V is sufficiently large. So the right-hand side of (7) is positive.Next consider the case where y ∈ ( AV, T + N − ε ]; the case where − y is in this rangecan be treated symmetrically. Here the convolution can be written as[ σ ∗ h ]( y ) = Z N max {− N,y − T } σ ( x ) · e − ( y − x )22 V d x. We break the range of integration into two sub-intervals: I = [max {− N, y − T } , N − ε ]and I = [ N − ε, N ]. On I we have σ ( x ) = G ( x ) − F ( x ) ≥ −
1, so Z x ∈ I σ ( x ) · e − ( y − x )22 V d x ≥ − N · e − ( y − N + ε )22 V . On I we have σ ( x ) ≥ ε , and furthermore σ ( x ) ≥ δ when x ∈ [ N − ε , N − ε ].Thus Z x ∈ I σ ( x ) · e − ( y − x )22 V d x ≥ ε · δ · e − ( y − N + ε V ≥ N · e − ( y − N + ε V − εA , In general we need a normalizing factor to ensure h integrates to one, but this multiplicative constantdoes not affect the argument. A . Observe that when y > AV and V is large, the exponent − ( y − N + ε ) V − εA is larger than − ( y − N + ε ) V . Summing the above twoinequalities then yields the desired result that [ σ ∗ h ]( y ) ≥ y ∈ ( T + N − ε, T + N ], then the range of integration in computing [ σ ∗ h ]( y ) isfrom x = y − T to x = N , where σ ( x ) is always positive. So the convolution is positive. Andif y > T + N , then clearly the convolution is zero. These arguments symmetrically applyto − y ∈ ( T + N − ε, T + N ] and − y > T + N . We therefore conclude that [ σ ∗ h ]( y ) ≥ y , completing the proof. A.1 The catalytic order for second-order stochastic dominance
In this section we point out that the above proof of Theorem 3 also yields an analogouscharacterization of the catalytic stochastic order for second-order stochastic dominance.Formally, we have
Proposition 5.
Let
X, Y ∈ L ∞ satisfy K a ( X ) > K a ( Y ) for all a ∈ [ −∞ , . Then thereexists an independent Z ∈ L ∞ such that X + Z ≥ Y + Z .Proof. As is well known, X dominates Y in second-order stochastic dominance if and onlyif their c.d.f.s satisfy Z z −∞ ( G ( y ) − F ( y )) d y ≥ z ∈ R . Thus, if we let Z be an independent random variable with density h , then X + Z ≥ Y + Z if and only if Z z −∞ [ σ ∗ h ]( y ) d y ≥ ∀ z ∈ R . Here, as in the proof of Theorem 3, σ denotes the difference G − F and is supportedon [ − N, N ]. Since K −∞ ( X ) > K −∞ ( Y ), we have min[ X ] > min[ Y ]. So we can choose ε, δ > σ ( x ) ≥ x ∈ [ − N, − N + ε ] and σ ( x ) ≥ δ for x ∈ [ − N + ε , − N + ε ].We again fix constant A such that e εA ≥ Nεδ .Now let h ( x ) = e − x V for | x | ≤ T , where V is a large variance and T = AV + N . Then,as in the proof of Theorem 3, we have[ σ ∗ h ]( y ) ≥ ∀ y ≤ − AV.
This simply uses the fact that σ is positive near the minimum of its support.Moreover, by assumption K a ( X ) > K a ( Y ) for a ≤
0. So by continuity there existssmall γ > K a ( X ) > K a ( Y ) for a ≤ γ . It follows that M σ ( a ) = Z N − N σ ( x )e ax d x = 1 a (cid:16) E h e aX i − E h e aY i(cid:17) > ∀ a ≤ γ.
20y continuity, we can find η > M σ ( a ) ≥ η ∀ a ∈ [ − A, γ ] . Thus, when y ∈ [ − AV, γV ], we can follow the calculation in (7) and (8) to obtain[ σ ∗ h ]( y ) = e − y V · Z N − N σ ( x ) · e yV · x · e − x V d x ≥ e − y V · (cid:18) η − N · e AN · (1 − e − N V ) (cid:19) ≥ e − y V · η , where the last step holds when V is sufficiently large.Therefore, [ σ ∗ h ]( y ) ≥ y ≤ γV , and clearly R z −∞ [ σ ∗ h ]( y ) d y ≥ z ≤ γV . Below we consider z > γV . The idea here is that R γV −∞ [ σ ∗ h ]( y ) d y is sufficiently positiveto compensate for the possible negative contribution from R zγV [ σ ∗ h ]( y ) d y . Specifically,using the above lower bound for [ σ ∗ h ]( y ), we have Z γV −∞ [ σ ∗ h ]( y ) d y ≥ Z γV − γV [ σ ∗ h ]( y ) d y ≥ ηγV · e − γ V . On the other hand, when y > γV we can bound the magnitude of [ σ ∗ h ]( y ) as follows: | [ σ ∗ h ]( y ) | ≤ Z N − N | σ ( x ) h ( y − x ) | d x ≤ Z N − N e − ( y − x )22 V d x ≤ N · e − ( γV − N )22 V Since σ is supported on [ − N, N ] and h is supported on [ − AV − N, AV + N ], we knowthat σ ∗ h is supported on [ − AV − N, AV + 2 N ]. Thus for z > γV , Z zγV [ σ ∗ h ]( y ) d y ≥ − Z AV +2 NγV | [ σ ∗ h ]( y ) | d y ≥ − ( AV + 2 N − γV ) · e − ( γV − N )22 V . It is easy to see that for sufficiently large V , ηγV · e − γ V > ( AV + 2 N − γV ) · e − ( γV − N )22 V . Hence the above estimates imply that R z −∞ [ σ ∗ h ]( y ) d y ≥ z > γV . So X + Z ≥ Y + Z as we desire to show. B Proof of Theorem 1
B.1 Integral representation
Recall that for fixed X , K X ( a ) = K a ( X ) can be seen as a function of a . Let L denote theset of functions { K X : X ∈ L ∞ } . If Φ is a monotone additive statistic and K X = K Y ,21hen X and Y have the same distribution and Φ( X ) = Φ( Y ). Thus there exists somefunctional F : L → R such that Φ( X ) = F ( K X ). It follows from the additivity of Φ andthe additivity of K a that F is additive: F ( K X + K Y ) = F ( K X ) + F ( K Y ). Moreover, F ismonotone in the sense that F ( K X ) ≥ F ( K Y ) whenever K X ≥ K Y (i.e., K X ( a ) ≥ K Y ( a )for all a ∈ ¯ R ); this follows from Lemma 1 which in turn is proved by Theorem 3 (see themain text).Next we show that the monotone additive functional F on L can be extended to apositive linear functional on the entire space of continuous functions C ( ¯ R ). We first equip L with the sup-norm of C ( ¯ R ) and establish a technical claim. Lemma 3. F : L → R is 1-Lipschitz: | F ( K X ) − F ( K Y ) | ≤ k K X − K Y k . Proof.
Let k K X − K Y k = ε . Then K X + ε = K X + ε ≥ K Y . Hence, by Lemma 1, F ( K Y ) ≤ F ( K X + ε ), and so F ( K Y ) − F ( K X ) ≤ F ( K X + ε ) − F ( K X ) = F ( K ε ) = Φ( ε ) = ε. Symmetrically we have F ( K X ) − F ( K Y ) ≤ ε , as desired. Lemma 4.
Any monotone additive functional F on L can be extended to a positive linearfunctional on C ( ¯ R ) .Proof. First consider the rational cone spanned by L :Cone Q ( L ) = { qL : q ∈ Q + , L ∈ L} . Define G : Cone Q ( L ) → R as G ( qL ) = qF ( L ), which is an extension of F . The functional G is well defined: If mn K = rn K for K , K ∈ L and n, m, r ∈ N , then, using the fact that L is closed under addition, we obtain mF ( K ) = F ( mK ) = F ( rK ) = rF ( K ), hence mn F ( K ) = rn F ( K ). G is also additive, because G (cid:18) mn K (cid:19) + G (cid:18) rn K (cid:19) = mn F ( K ) + rn F ( K ) = 1 n F ( mK + rK ) = G (cid:18) mn K + rn K (cid:19) . In the same way we can show G is positively homogeneous over Q + and monotone.Moreover, G is Lipschitz: Lemma 3 implies (cid:12)(cid:12)(cid:12)(cid:12) G (cid:18) mn K (cid:19) − G (cid:18) rn K (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = 1 n | F ( mK ) − F ( rK ) | ≤ n k mK − rK k = (cid:13)(cid:13)(cid:13)(cid:13) mn K − rn K (cid:13)(cid:13)(cid:13)(cid:13) . G can be extended to a Lipschitz functional H defined on the closure of Cone Q ( L )with respect to the sup norm. In particular, H is defined on the convex cone spanned by L :Cone( L ) = { λ K + · · · + λ k K k : k ∈ N and for each 1 ≤ i ≤ k, λ i ∈ R + , K i ∈ L} . It is immediate to verify that the properties of additivity, positive homogeneneity (nowover R + ), and monotonicity extend, by continuity, from G to H .Consider the vector subspace V = Cone( L ) − Cone( L ) ⊂ C ( ¯ R ) and define I : V → R as I ( g − g ) = H ( g ) − H ( g )for all g , g ∈ Cone( L ). The functional I is well defined and linear (because H is additiveand positively homogeneous). Moreover, by monotonicity of H , I ( f ) ≥ f ∈ V .The result then follows from the next theorem of Kantorovich (1937), a generalizationof the Hahn-Banach Theorem. It applies not only to C ( ¯ R ) but to any Riesz space (seeTheorem 8.32 in Aliprantis and Border, 2006). Theorem. If V is a vector subspace of C ( ¯ R ) with the property that for every f ∈ C ( ¯ R ) there exists a function g ∈ V such that g ≥ f . Then every positive linear functional on V extends to a positive linear functional on C ( ¯ R ) . The “majorization” condition g ≥ f is satisfied because every function in C ( ¯ R ) isbounded and V contains all of the constant functions.The integral representation in Theorem 1 now follows from Lemma 4 by the Riesz-Markov-Kakutani Representation Theorem. B.2 Uniqueness of measure
We complete the proof of Theorem 1 by showing that the measure µ in the representationis unique. The following result shows that uniqueness holds even on the smaller domain L ∞ N of non-negative integer-valued random variables. Lemma 5.
Suppose µ and ν are two Borel probability measures on ¯ R such that Z ¯ R K a ( X ) d µ ( a ) = Z ¯ R K a ( X ) d ν ( a ) . for all X ∈ L ∞ N . Then µ = ν .Proof. We first show µ ( {∞} ) = ν ( {∞} ). For any ε >
0, consider the Bernoulli randomvariable X ε that takes value 1 with probability ε . It is easy to see that as ε decreases23o zero, K a ( X ε ) also decreases to zero for each a < ∞ whereas K ∞ ( X ε ) = max[ X ε ] = 1.Since K a ( X ε ) is uniformly bounded in [0 , ε → Z ¯ R K a ( X ε ) d µ ( a ) = µ ( {∞} ) . A similar identity holds for the measure ν , and µ ( {∞} ) = ν ( {∞} ) follows from theassumption that R ¯ R K a ( X ε ) d µ ( a ) = R ¯ R K a ( X ε ) d ν ( a ).We can symmetrically apply the above argument to the Bernoulli random variable thattakes value 1 with probability 1 − ε . Thus µ ( {−∞} ) = ν ( {−∞} ) holds as well.Next, for each n ∈ N + and real number b >
0, let X n,b ∈ L ∞ N satisfy P [ X n,b = n ] = e − bn P [ X n,b = 0] = 1 − e − bn . Then K a ( X n,b ) = 1 a log h (1 − e − bn ) + e ( a − b ) n i , and so lim n →∞ n K a ( X n,b ) = lim n →∞ n a log h − e − bn + e ( a − b ) n i = a < b a − ba if a ≥ b. This result holds also for a = 0 , ±∞ .Note that n K a ( X n,b ) is uniformly bounded in [0 ,
1] for all values of n, b, a , since K a ( X n,b ) is bounded between min[ X n,b ] = 0 and max[ X n,b ] = n . Thus, by the DominatedConvergence Theorem,lim n →∞ Z ¯ R n K a ( X n,b ) d µ ( a ) = Z [ b, ∞ ] a − ba d µ ( a ) , (9)and similarly for ν . It follows that for all b > Z [ b, ∞ ] a − ba d µ ( a ) = Z [ b, ∞ ] a − ba d ν ( a ) . As µ ( {∞} ) = ν ( {∞} ), we in fact have Z [ b, ∞ ) a − ba d µ ( a ) = Z [ b, ∞ ) a − ba d ν ( a ) . This common integral is denoted by f ( b ). 24e now define a measure ˆ µ on (0 , ∞ ) by the condition dˆ µ ( a )d µ ( a ) = a ; note that ˆ µ is apositive measure, but need not be a probability measure. Then f ( b ) = Z [ b, ∞ ) a − ba d µ ( a ) = Z [ b, ∞ ) ( a − b ) dˆ µ ( a ) = Z ∞ b ˆ µ ([ x, ∞ )) d x, where the last step uses Tonelli’s Theorem. Hence ˆ µ ([ b, ∞ ]) is the negative of the leftderivative of f ( b ) (this uses the fact that ˆ µ ([ b, ∞ ]) is left continuous in b ). In the sameway, if we define ˆ ν by dˆ ν ( a )d ν ( a ) = a , then ˆ ν ([ b, ∞ ]) is also the negative of the left derivativeof f ( b ). Therefore ˆ µ and ˆ ν are the same measure on (0 , ∞ ), which implies that µ and ν coincide on (0 , ∞ ).By a symmetric argument (with n − X n,b in place of X n,b ), we deduce that µ and ν also coincide on ( −∞ , µ and ν musthave the same mass at 0, if any. So µ = ν . C Proof of Theorem 2
C.1 Proof for L ∞ + and L ∞ N It suffices to show that a monotone additive statistic defined on L ∞ + or L ∞ N can be extendedto a monotone additive statistic defined on L ∞ . First suppose Φ is defined on L ∞ + , thecollection of non-negative random variables. Then for any bounded random variable X ,we can define Ψ( X ) = min[ X ] + Φ( X − min[ X ]) , where we note that X − min[ X ] is a non-negative random variable.Clearly Ψ is a statistic that depends only on the distribution of X (as Φ does), andΨ( c ) = c + Φ(0) = c for constants c . When X is non-negative, the additivity of Φgives Φ( X ) = Φ(min[ X ]) + Φ( X − min[ X ]) = min[ X ] + Φ( X − min[ X ]), so Ψ is anextension of Φ. Moreover, Ψ is additive because min[ X + Y ] = min[ X ] + min[ Y ], andΦ( X + Y − min[ X + Y ]) = Φ( X − min[ X ]) + Φ( Y − min[ Y ]) by the additivity of Φ.Finally, to show Ψ is monotone, suppose X and Y are bounded random variables satisfying X ≥ Y . Then we can choose a sufficiently large n such that X + n and Y + n are bothnon-negative, and X + n ≥ Y + n . Since Φ is monotone for non-negative random variables,Φ( X + n ) ≥ Φ( Y + n ). Thus Ψ( X + n ) ≥ Ψ( Y + n ) by the fact that Ψ extends Φ, andΨ( X ) ≥ Ψ( Y ) by the additivity of Ψ. This proves that Ψ is a monotone additive statisticon L ∞ that extends Φ.In what follows, we consider the other case where Φ is initially defined on L ∞ N , thecollection of non-negative integer-valued random variables. Given what has been shownabove, we just need to extend Φ to a monotone additive statistic on L ∞ + . In this proofand later, we denote by X ∗ n the random variable that is the sum of n i.i.d. copies of X .25e also denote by b X c the random variables that equals X rounded down to the nearest(non-negative) integer. Note that b X + 1 c ≥ X ≥ b X c . We thus also have b X + Y c ≥ b X c + b Y c , (10) b X + 1 c + b Y + 1 c ≥ b X + Y + 1 c . (11)Given a monotone additive statistic Φ : L ∞ N → R , define Ψ : L ∞ + → R byΨ( X ) = lim n →∞ n Φ( b X ∗ n + 1 c ) = lim n →∞ n Φ( b X ∗ n c ) . The first limit exists because b n = Φ( b X ∗ n + 1 c ) is a non-negative sequence which issub-additive by (11) and by monotonicity and additivity of Φ, and thus lim n →∞ b n /n =inf n b n /n is well-known to exist. That the two limits above coincide follows from theadditivity of Φ.Ψ is a statistic because Ψ( c ) = lim n →∞ n Φ( b nc c ) = lim n →∞ n b nc c = c for everyconstant c ≥
0. It is also immediate to see that for integer-valued random variables X ,Ψ( X ) = lim n →∞ n Φ( b X ∗ n c ) = lim n →∞ n Φ( X ∗ n ) = lim n →∞ n n Φ( X ) = Φ( X ) . So Ψ extends Φ.Moreover, if X ≥ Y , then b X ∗ n c ≥ b Y ∗ n c for each n . This implies Ψ( X ) ≥ Ψ( Y ) bythe above definition, so Ψ is monotone. Finally, to check Ψ is additive, we suppose X and Y be independent random variables. Then using (11), we have that for each n , b ( X + Y ) ∗ n + 1 c ≤ b X ∗ n + 1 c + b Y ∗ n + 1 c . Together with the monotonicity and additivity of Φ, this impliesΨ( X + Y ) = lim n →∞ n Φ( b ( X + Y ) ∗ n + 1 c ) ≤ lim n →∞ n Φ( b X ∗ n + 1 c ) + lim n →∞ n Φ( b Y ∗ n + 1 c )= Ψ( X ) + Ψ( Y ) . Symmetrically, we can use the other definition of Ψ( X + Y ) and (10) to show thatΨ( X + Y ) ≥ Ψ( X ) + Ψ( Y ). Hence equality holds, and Ψ is a monotone additive statisticthat extends Φ. This completes the proof. 26 .2 Proof for L M We break the proof into several steps below:
C.2.1 Step 1: The catalytic order on L M We first establish a generalization of Theorem 3 to unbounded random variables. For tworandom variables X and Y with c.d.f. F and G respectively, we say that X dominates Y in both tails if there exists a positive number N with the property that G ( x ) > F ( x ) for all | x | ≥ N. That is, we require the stochastic dominance condition between X and Y to hold in thetails. In particular, X needs to be unbounded from above, and Y unbounded from below. Lemma 6.
Suppose
X, Y ∈ L M satisfy K a ( X ) > K a ( Y ) for every a ∈ R . Suppose furtherthat X dominates Y in both tails. Then there exists an independent random variable Z ∈ L M such that X + Z ≥ Y + Z .Proof. We will show that Z can be taken to have a normal distribution, which does belongto L M . Following the proof of Theorem 3, we let σ ( x ) = G ( x ) − F ( x ), and seek to showthat [ σ ∗ h ]( y ) ≥ y when h is a Gaussian density with sufficiently large variance.By assumption, σ ( x ) is strictly positive for | x | ≥ N . Thus there exists δ > R N +2 N +1 σ ( x ) d x > δ , as well as R − N − − N − σ ( x ) d x > δ . We fix A > A ≥ Nδ .Similar to (7), we have for h ( x ) = e − x V thate y V Z σ ( x ) h ( y − x ) d x = Z ∞−∞ σ ( x ) · e yV · x · e − x V d x. (12)The variance V is to be determined below.We first show that the right-hand side is positive if V ≥ ( N + 2) and yV ≥ A . Indeed,since σ ( x ) > | x | ≥ N , this integral is bounded from below by Z N − N σ ( x ) · e yV · x · e − x V d x + Z N +2 N +1 σ ( x ) · e yV · x · e − x V d x ≥ − N · e yV · N + δ · e yV · ( N +1) · e − ( N +2)22 V = e yV · N · ( − N + δ · e yV · e − ( N +2)22 V ) > , where the last inequality uses e yV ≥ e A ≥ Nδ and e − ( N +2)22 V ≥ e − > . By a symmetricargument, we can show that the right-hand side of (12) is also positive when yV ≤ − A .27t remains to consider the case where yV ∈ [ − A, A ]. Here we rewrite the integral on theright-hand side of (12) as Z ∞−∞ σ ( x ) · e yV · x · e − x V d x = M σ ( yV ) − Z ∞−∞ σ ( x ) · e yV · x · (1 − e − x V ) d x, where M σ ( a ) = R ∞−∞ σ ( x ) · e ax d x = a E h e aX i − a E h e aY i is by assumption strictly positivefor all a . By continuity, there exists some ε > M σ ( a ) > ε for all | a | ≤ A . So itonly remains to show that when V is sufficiently large, Z ∞−∞ σ ( x ) · e ax · (1 − e − x V ) d x < ε for all | a | ≤ A. (13)To estimate this integral, note that M σ ( A ) = R ∞−∞ σ ( x ) · e Ax d x is finite. Since σ ( x ) > | x | sufficiently large, we deduce from the Monotone Convergence Theorem that R T −∞ σ ( x ) · e Ax d x converges to M σ ( A ) as T → ∞ . In other words, R ∞ T σ ( x ) · e Ax d x → T > N such that Z ∞ T σ ( x ) · e Ax d x < ε Z − T −∞ σ ( x ) · e − Ax d x < ε − e − x V ≥ ax ≤ e A | x | when | a | ≤ A , we deduce that Z | x |≥ T σ ( x ) · e ax · (1 − e − x V ) d x < ε | a | ≤ A. Moreover, for this fixed T , we have e − T V → V is large, and thus Z | x |≤ T σ ( x ) · e ax · (1 − e − x V ) d x < T e AT (1 − e − T V ) < ε | a | ≤ A. These estimates together imply that (13) holds for sufficiently large V . This completes theproof. C.2.2 Step 2: A perturbation argument
With Lemma 6, we know that if Φ is a monotone additive statistic defined on L M , then K a ( X ) ≥ K a ( Y ) for all a ∈ R implies Φ( X ) ≥ Φ( Y ) under the additional assumptionthat X dominates Y in both tails (same proof as for Lemma 1). Below we deduce thesame result without this extra assumption. To make the argument simpler, assume X and Y are unbounded both from above and from below; otherwise, we can add to theman independent Gaussian random variable without changing either the assumption or28he conclusion. In doing so, we can further assume X and Y admit probability densityfunctions.The idea is that even if the right tail of X is not uniformly heavier than that of Y ,we can add to X a positive random variable with sufficiently heavy tail, such that theresulting sum has heavier tail than Y . We first construct a heavy right-tailed randomvariable as follows: Lemma 7.
For any Y ∈ L M that is unbounded from above and admits densities, thereexists Z ∈ L M such that Z ≥ and P [ Z > x ] P [ Y > x ] → ∞ as x → ∞ . Proof.
For this result, it is without loss assume Y ≥ Y by | Y | and only strengthen the conclusion. Let g ( x ) be the probability density function of Y . Weconsider a random variable Z whose p.d.f. is given by cxg ( x ) for all x ≥
0, where c > R x ≥ cxg ( x ) d x = 1. Since the likelihood ratio between Z = x and Y = x is cx , it is easy to see that the ratio of tail probabilities also diverges.Thus it only remains to check Z ∈ L M . This is because E h e aZ i = c Z x ≥ xg ( x )e ax d x, which is simply c times the derivative of E h e aY i with respect to a . It is well-known thatthe moment generating function is smooth whenever it is finite. So this derivative is finite,and Z ∈ L M .In the same way, we can construct heavy left-tailed distributions: Lemma 8.
For any X ∈ L M that is unbounded from below and admits densities, thereexists W ∈ L M , such that W ≤ and P [ W ≤ x ] P [ X ≤ x ] → ∞ as x → −∞ . The following result constructs perturbed versions of any two random variables X and Y that satisfy “dominance in both tails.” For any random variable Z ∈ L M and every ε >
0, let Z ε be the random variable that equals Z with probability ε , and equals zerowith probability 1 − ε . Note that Z ε also belongs to L M . Lemma 9.
Given any two random variables
X, Y ∈ L M that are unbounded on both sidesand admit densities. Let Z ≥ and W ≤ be constructed from the above two lemmata.Then for every ε > , X + Z ε dominates Y + W ε in both tails. roof. For the right tail, we need P [ X + Z ε > x ] > P [ Y + W ε > x ] for all x ≥ N . Notethat W ε ≤
0, so P [ Y + W ε > x ] ≤ P [ Y > x ]. On other hand, P [ X + Z ε > x ] ≥ P [ X ≥ · P [ Z ε > x ] = P [ X ≥ · ε · P [ Z > x ] . Since by assumption X is unbounded from above, the term P [ X ≥ · ε is a strictly positiveconstant that does not depend on x . Thus for sufficiently large x , we indeed have P [ X ≥ · ε · P [ Z > x ] > P [ Y > x ]by the construction of Z . Thus we do have dominance in the right tail. The left tail workssimilarly. C.2.3 Step 3: Monotonicity with respect to K a The next result generalizes the key Lemma 1 to our current setting:
Lemma 10.
Let
Φ : L M → R be a monotone additive statistic. If K a ( X ) ≥ K a ( Y ) for all a ∈ R then Φ( X ) ≥ Φ( Y ) .Proof. As discussed, we can without loss assume
X, Y are unbounded on both sides, andadmit densities. Let Z and W be constructed as above, then for each ε > X + Z ε dominates Y + W ε in both tails, and K a ( X + Z ε ) > K a ( X ) ≥ K a ( Y ) > K a ( Y + W ε ) forevery a ∈ R , where the strict inequalities use Z ≥ W ≤ X + Z ε and Y + W ε satisfy the assumptions in Lemma 6, so we can findan independent random variable V ∈ L M (depending on ε ), such that X + Z ε + V ≥ Y + W ε + V. Monotonocity and additivity of Φ then imply Φ( X ) + Φ( Z ε ) ≥ Φ( Y ) + Φ( W ε ), aftercancelling out Φ( V ). The desired result follows from the lemma below, which shows thatour perturbations only slightly affect the statistic value. Lemma 11.
For any Z ∈ L M with Z ≥ , it holds that Φ( Z ε ) → as ε → . Similarly Φ( W ε ) → for any W ∈ L M with W ≤ .Proof. We focus on the case for Z ε . Suppose for contradiction that Φ( Z ε ) does not convergeto zero. Note that as ε decreases, Z ε decreases in first-order stochastic dominance. SoΦ( Z ε ) ≥ δ > Z ε ) > δ for every ε > µ ε be image measure of Z ε . We now choose a sequence ε n that decreases to zerovery fast, and consider the measures ν n = µ ∗ nε n , n -th convolution power of µ ε n . Thus the sum of n i.i.d. copies of Z ε n is arandom variable whose image measure is ν n . We denote this sum by U n .For each n we choose ε n sufficiently small to satisfy two properties: (i) ε n ≤ n , and(ii) it holds that E h e nU n − i ≤ − n . This latter inequality can be achieved because E h e nU n i = (cid:16) E h e nZ εn i(cid:17) n , and as ε n → E h e nZ εn i = 1 − ε n + ε n E h e nZ i → Z ∈ L M .For these choices of ε n and corresponding U n , let H n ( x ) denote the c.d.f. of U n , anddefine H ( x ) = inf n H n ( x ) for each x ∈ R . Since H n ( x ) = 0 for x <
0, the same is true for H ( x ). Also note that each H n ( x ) is a non-decreasing and right-continuous function in x ,and so is H ( x ).We claim that lim x →∞ H ( x ) = 1. Indeed, recall that U n is the n -fold sum of Z ε n , whichhas mass 1 − ε n at zero. So U n has mass at least (1 − ε n ) n ≥ (1 − n ) n ≥ − n at zero. In otherwords, H n (0) ≥ − n . By considering the finitely many c.d.f.s H ( x ) , H ( x ) , . . . , H n − ( x ),we can find N such that H i ( x ) ≥ − n for every i < n and x ≥ N . Together with H i ( x ) ≥ H i (0) ≥ − i ≥ − n for i ≥ n , we conclude that H i ( x ) ≥ − n whenever x ≥ N ,and so H ( x ) ≥ − n . Since n is arbitrary, the claim follows. The fact that H n ( x ) ≥ − n also shows that in the definition H ( x ) = inf n H n ( x ), the “inf” is actually achieved as theminimum (since whenever the inf is less than 1, only finitely many H n ( x ) matters).These properties of H ( x ) imply that it is the c.d.f. of some non-negative randomvariable U . We next show U ∈ L M , i.e., E h e aU i < ∞ for every a ∈ R . Since U ≥
0, weonly need to consider a ≥
0. To do this, we take advantage of the following identity basedon integration by parts: E h e aU n − i = − Z x ≥ (e ax −
1) d(1 − H n ( x )) = a Z x ≥ e ax (1 − H n ( x )) d x. Now recall that we chose U n so that E h e nU n − i ≤ − n . So E h e aU n − i ≤ − n for everypositve integer n ≥ a . It follows that the sum P ∞ n =1 E h e aU n − i is finite for every a ≥ a Z x ≥ e ax ∞ X n =1 (1 − H n ( x )) d x < ∞ , where we have switched the order of summation and integration by the Monotone Conver-gence Theorem. Since H ( x ) = min n H n ( x ), it holds that 1 − H ( x ) ≤ P ∞ n =1 (1 − H n ( x )) forevery x . And thus E h e aU − i = a Z x ≥ e ax (1 − H ( x )) d x < ∞ also holds. This proves U ∈ L M . 31e are finally in a position to deduce a contradiction. Since by construction the c.d.f.of U is no larger than the c.d.f. of each U n , we have U ≥ U n and Φ( U ) ≥ Φ( U n ) bymonotonicity of Φ. But Φ( U n ) = n Φ( Z ε n ) > nδ by additivity, so this leads to Φ( U ) beinginfinite. This contradiction proves the desired result. C.2.4 Step 4: Functional analysis
To complete the proof of the case of L M in Theorem 2, we also need to modify the functionalanalysis step in our earlier proof of Theorem 1. One difficulty is that for an unboundedrandom variable X , K a ( X ) takes the value ∞ as a → ∞ . Thus we can no longer think of K X ( a ) = K a ( X ) as a real-valued continuous function on ¯ R .We remedy this as follows. Note first that if Φ is a monotone additive statistic definedon L M , then it is also monotone and additive when restricted to the smaller domain ofbounded random variables. Thus Theorem 1 gives a probability measure µ on R ∪ {±∞} such that Φ( X ) = Z ¯ R K a ( X ) d µ ( a )for all X ∈ L ∞ . In what follows, µ is fixed. We just need to show that this representationalso holds for X ∈ L M .As a first step, we show µ does not put any mass on ±∞ . Indeed, if µ ( {∞} ) = ε > X ≥
0, the above integral gives Φ( X ) ≥ ε · max[ X ].Take any Y ∈ L M such that Y ≥ Y is unbounded from above. Then monotonicityof Φ gives Φ( Y ) ≥ Φ(min { Y, n } ) ≥ ε · n for each n . This contradicts Φ( Y ) being finite.Similarly we can rule out any mass at −∞ .The next lemma gives a way to extend the representation to certain unbounded randomvariables. Lemma 12.
Suppose Z ∈ L M is bounded from below by and unbounded from above,while Y ∈ L M is bounded from below and satisfies lim a →∞ K a ( Y ) K a ( Z ) = 0 , then Φ( Y ) = Z ( −∞ , ∞ ) K a ( Y ) d µ ( a ) . Proof.
Given the assumptions, K a ( Z ) ≥ a ∈ R , with lim a →∞ K a ( Z ) = ∞ .Let L ZM be the collection of random variables X ∈ L M such that X is bounded frombelow, and lim a →∞ K a ( X ) K a ( Z ) exists and is finite. L ZM includes all bounded X (in which caselim a →∞ K a ( X ) K a ( Z ) = 0), as well as Y and Z itself. L ZM is also closed under adding independentrandom variables.Now, for each X ∈ L ZM , we can define K X | Z ( a ) = K a ( X ) K a ( Z ) , K X ( a ) when Z is the constant 1 (in that case L ZM is precisely L ∞ ). This function K X | Z ( a ) extends by continuity to a = −∞ , where itsvalue is min[ X ]min[ Z ] , as well as to a = ∞ by construction. Thus K X | Z ( · ) is a continuous functionon R .Since Φ induces an additive statistic when restricted to L ZM , and K X | Z + K Y | Z = K X + Y | Z , we have an additive functional F defined on L = { K X | Z : X ∈ L ZM } , given by F ( K X | Z ) = Φ( X )Φ( Z ) .F is well-defined because Z ≥ Z ) ≥
1, and F (1) = 1. By Lemma 10, F isalso monotone in the sense that K X | Z ( a ) ≥ K Y | Z ( a ) for each a ∈ R implies F ( K X | Z ) ≥ F ( K Y | Z ).Likewise we can show F is 1-Lipschitz. Note that K X | Z ( a ) ≤ K Y | Z ( a ) + mn is equivalentto K a ( X ) ≤ K a ( Y ) + mn K a ( Z ) and equivalent to K a ( X ∗ n ) ≤ K a ( Y ∗ n + Z ∗ m ), where wewrite X ∗ n for the sum of n i.i.d. copies of X . If this holds for all a , then by Lemma 10 wealso have Φ( X ∗ n ) ≤ Φ( Y ∗ n + Z ∗ m ), and thus Φ( X ) ≤ Φ( Y ) + mn Φ( Z ) by additivity. Since mn is an arbitrary positive rational number, we conclude that for any real number ε > K X | Z ( a ) ≤ K Y | Z ( a ) + ε for all a implies Φ( X ) ≤ Φ( Y ) + ε Φ( Z ). Thus the functional F is1-Lipschitz.Given these properties, we can exactly follow the proof of Theorem 1 to extend thefunctional F to be a positive linear functional on the space of all continuous functions over¯ R (the majorization condition is again satisfied because the constant function n belongs to L for every positive integer n , since K Z | Z = 1). Therefore, by the Riesz-Markov-KakutaniRepresentation Theorem, we obtain a probability measure µ Z on ¯ R such thatΦ( X )Φ( Z ) = Z ¯ R K a ( X ) K a ( Z ) d µ Z ( a )holds for all X ∈ L ZM .In particular, for any X bounded from below such that lim a →∞ K a ( X ) K a ( Z ) = 0, it holdsthat Φ( X ) = Z [ −∞ , ∞ ) K a ( X ) · Φ( Z ) K a ( Z ) d µ Z ( a ) , where we are able to exclude ∞ from the range of integration (this is important for thechange of measure argument below).If we define the measure ˆ µ Z by dˆ µ Z d µ Z ( a ) = Φ( Z ) K a ( Z ) ≤ Φ( Z ), then since K a ( X ) is finite for a < ∞ , we have Φ( X ) = Z [ −∞ , ∞ ) K a ( X ) · dˆ µ Z ( a ) . This in particular holds for all bounded X , so plugging in X = 1 gives that ˆ µ Z is aprobability measure. But now we have two probability measures µ and ˆ µ Z on ¯ R that lead33o the same integral representation for bounded random variables, so Lemma 5 impliesthat ˆ µ Z coincides with µ and is supported on the standard real line. Plugging in X = Y in the above display then yields the desired result.The preceding lemma is useful because, as it turns out, for any X ∈ L M bounded frombelow and unbounded from above, there exists Z ∈ L M bounded from below by 1 suchthat lim a →∞ K a ( X ) K a ( Z ) = 0 (which automatically implies Z is unbounded from above). This isthe idea behind the following result: Lemma 13.
For every X ∈ L M that is bounded from below, Φ( X ) = Z ( −∞ , ∞ ) K a ( X ) d µ ( a ) . Proof.
It suffices to consider X that is unbounded from above. Moreover, without loss wecan assume X ≥ X .Given the previous lemma, we just need to construct Z ≥ a →∞ K a ( X ) K a ( Z ) = 0.Note that E h e aX i strictly increases in a for a ≥
0. This means we can uniquely define asequence a < a < · · · by the equation E h e a n X i = e n . This sequence diverges as n → ∞ .We then choose any increasing sequence b n such that b n > n and a n b n > n .Consider the random variable Z that is equal to b n with probability e − anbn for each n ,and equal to 1 with remaining probability. To see that Z ∈ L M , we have E h e aZ i ≤ e a + ∞ X n =1 e − anbn · e ab n = e a + ∞ X n =1 e ( a − an ) · b n . For any fixed a , a n is eventually greater than a + 1. This, together with the fact that b n > n , implies the above sum converges.Moreover, for any a ∈ [ a n , a n +1 ), we have E h e aZ i ≥ E h e a n Z i ≥ P [ Z = b n ] · e a n b n ≥ e anbn > e n , whereas E h e aX i ≤ E h e a n +1 X i ≤ e n +1 . Thus K a ( X ) K a ( Z ) = log E h e aX i log E [e aZ ] ≤ n + 1 n , which converges to zero as a (and thus n ) approaches infinity. C.2.5 Step 5: Completing the proof
By a symmetric argument, the representation Φ( X ) = R ( −∞ , ∞ ) K a ( X ) d µ ( a ) also holds forall X bounded from above. In the remainder of the proof, we will use an approximationargument to generalize this to all X ∈ L M . We show a technical lemma that facilitates theargument: 34 emma 14. The measure µ is supported on a compact interval of the standard real line.Proof. Suppose not, and without loss assume the support of µ is unbounded from above.We will construct a non-negative Y ∈ L M such that Φ( Y ) = ∞ according to the integralrepresentation. Indeed, by assumption we can find a sequence 2 < a < a < · · · suchthat a n → ∞ and µ ([ a n , ∞ )) ≥ n for all large n . Let Y be the random variable thatequals n with probability e − an · n for each n , and equals 0 with remaining probability. Thensimilar to the above, we can show Y ∈ L M . Moreover, E h e a n Y i ≥ e an · n , implying that K a n ( Y ) ≥ n . Since K a ( Y ) is increasing in a , we deduce that for each n , Z [ a n , ∞ ) K a ( Y ) d µ ( a ) ≥ K a n ( Y ) · µ ([ a n , ∞ )) ≥ n · n = 12 . The fact that this holds for a n → ∞ contradicts the result that Φ( Y ) = R ( −∞ , ∞ ) K a ( Y ) d µ ( a )is finite.Thus we can take N sufficiently large so that µ is supported on [ − N, N ]. To finishthe proof, consider any X ∈ L M that may be unbounded on both sides. For each positiveinteger n , let X n = min { X, n } denote the truncation of X at n . Since X ≥ X n , we haveΦ( X ) ≥ Φ( X n ) = Z [ − N,N ] K a ( X n ) d µ ( a )Observe that for each a ∈ [ − N, N ], K a ( X n ) converges to K a ( X ) as n → ∞ . Moreover, thefact that K a ( X n ) increases both in n and in a implies that for all a and all n , | K a ( X n ) | ≤ max {| K a ( X ) | , | K a ( X ) |} ≤ max {| K − N ( X ) | , | K N ( X ) | , | K − N ( X ) | , | K N ( X ) |} . As K a ( X n ) is uniformly bounded, we can apply the Dominated Convergence Theorem todeduce Φ( X ) ≥ lim n →∞ Z [ − N,N ] K a ( X n ) d µ ( a ) = Z [ − N,N ] K a ( X ) d µ ( a ) . On the other hand, if we truncate the left tail and consider X − n = max { X, − n } , then asymmetric argument showsΦ( X ) ≤ lim n →∞ Z [ − N,N ] K a ( X − n ) d µ ( a ) = Z [ − N,N ] K a ( X ) d µ ( a ) . Therefore for all X ∈ L M it holds thatΦ( X ) = Z [ − N,N ] K a ( X ) d µ ( a ) . This completes the entire proof of the case of L M in Theorem 2.35 Additional Proofs
D.1 Proof of Theorem 4
In the first step, we fix any reward x >
0. Then by monotonicity in time and continuity,for each ( x, T ) there exists a (unique) deterministic time Φ x ( T ) such that( x, Φ x ( T )) ∼ ( x, T ) . Clearly, when T is a deterministic time, Φ x ( T ) is simply T itself. Note also that if S first-order stochastically dominates T , then( x, Φ x ( T )) ∼ ( x, T ) (cid:23) ( x, S ) ∼ ( x, Φ x ( S )) , so that Φ x ( S ) ≥ Φ x ( T ). We next show that for any T and S that are independent,Φ x ( T + S ) = Φ x ( T )+Φ x ( S ). Indeed, by stationarity, ( x, Φ x ( T )) ∼ ( x, T ) implies ( x, Φ x ( T )+ S ) ∼ ( x, T + S ) and ( x, Φ x ( S )) ∼ ( x, S ) implies ( x, Φ x ( T ) + Φ x ( S )) ∼ ( x, Φ x ( T ) + S ).Taken together, we have ( x, Φ x ( T ) + Φ x ( S )) ∼ ( x, T + S ) . Since Φ x ( T ) + Φ x ( S ) is a deterministic time, the definition of Φ x gives Φ x ( T ) + Φ x ( S ) =Φ x ( T + S ) as desired.In the second step, note that our preference preference (cid:23) induces a preference on R + × R + consisting of deterministic dated rewards. By Theorem 2 in Fishburn andRubinstein (1982), for any given r > u with u (0) = 0 such that for deterministic times t, s ≥ x, t ) (cid:23) ( y, s ) if and only if u ( x ) · e − rt ≥ u ( y ) · e − rs . By definition, ( x, T ) ∼ ( x, Φ x ( T )) for any random time T . Thus we obtain that the decisionmaker’s preference is represented by( x, T ) (cid:23) ( y, S ) if and only if u ( x ) · e − r Φ x ( T ) ≥ u ( y ) · e − r Φ y ( S ) . While Φ ( T ) was not defined before, it will not matter because u (0) = 0.It remains to show that for all x, y >
0, Φ x and Φ y are the same statistic. For this wechoose deterministic times t and s such that ( x, t ) ∼ ( y, s ), i.e., u ( x ) · e − rt = u ( y ) · e − rs . For any random time T , stationarity implies ( x, t + T ) ∼ ( y, s + T ), so that u ( x ) · e − r Φ x ( t + T ) = u ( y ) · e − r Φ y ( s + T ) . Using the additivity of Φ x and Φ y , we can divide the above two equalities and obtainΦ x ( T ) = Φ y ( T ) as desired. Since this holds for all T and all x, y >
0, we can writeΦ x ( T ) = Φ( T )for a single monotone additive statistic Φ. This completes the proof.36 .2 Time lotteries in discrete time In this section we consider the domain R + × L ∞ N of discrete time lotteries. The originalaxioms 3.1, 3.2 and 3.3 for continuous time directly carry over to discrete time, except thatin some of their statements we now restrict to integer-valued random times. However, itturns out that we need a strengthening of the continuity axiom 3.4: Axiom D.1 (Strong Continuity) . Consider any sequence of discrete time lotteries { ( x n , T n ) } such that x n → x , the distributions of T n weakly converge to that of T , and { max[ T n ] } isuniformly bounded. Then for any discrete time lottery ( y, S ) , ( x n , T n ) (cid:23) ( y, S ) for every n implies ( x, T ) (cid:23) ( y, S ) , and ( x n , T n ) (cid:22) ( y, S ) for every n implies ( x, T ) (cid:22) ( y, S ) . A feature of the above continuity axiom is that it rules out extreme risk aversion (orrisk-seeking) over time. Thus, in the following analogue of Theorem 4, the monotoneadditive statistic Φ is generated by a measure µ supported on R rather than the extendedreal line ¯ R . We call such Φ strongly monotone . Proposition 6.
A preference (cid:23) on R + × L ∞ N satisfies Axioms 3.1, 3.2, 3.3 and D.1 if andonly if there exists a strongly monotone additive statistic Φ , an r > , and a continuous andstrictly increasing utility function u : R + → R + with u (0) = 0 , such that (cid:23) is representedby f ( x, T ) = u ( x ) · e − r Φ( T ) . Proof.
We first check that the representation satisfies the strong continuity Axiom D.1(the other axioms are straightforward to check). Indeed, supposeΦ( T ) = Z R K a ( T ) d µ ( a )for some probability measure µ supported on R . Then whenever T n → T (in terms oftheir distributions) and max[ T n ] is uniformly bounded, we can deduce from the DominatedConvergence Theorem that Φ( T n ) → Φ( T ). This implies u ( x n ) · e − r Φ( T n ) → u ( x ) · e − r Φ( T ) ,and thus strong continuity holds.Turning to the opposite direction, we assume the preference (cid:23) satisfies the axioms. Wefirst prove the following stronger version of stationarity:( x, T ) (cid:23) ( y, S ) if and only if ( x, T + D ) (cid:23) ( y, S + D )whenever D is independent from T and S . The “only if” direction is assumed, so we focuson the “if”. It suffices to show that the strict preference ( x, T ) (cid:31) ( y, S ) also implies thestrict preference ( x, T + D ) (cid:31) ( y, S + D ). Since (0 , T ) (cid:22) ( y, S ), by strong continuity there We do not use the terminology “strictly monotone” because it suggests a weaker requirement thatΦ( X ) > Φ( Y ) whenever X is strictly larger than Y in first-order stochastic dominance. That wouldcorrespond to µ being not entirely supported on {±∞} , whereas here we require µ to have no mass at ±∞ . x ∈ [0 , x ) such that (˜ x, T ) ∼ ( y, S ). Thus by the assumed version of stationarity,(˜ x, T + D ) ∼ ( y, S + D ). Monotonicity in money then yields ( x, T + D ) (cid:31) (˜ x, T + D ) ∼ ( y, S + D ). This gives the desired result.Next, as in the proof of Theorem 4, we fix x > x ( T ) for every T . However, since Φ x ( T ) will not be an integer in general, we cannotdefine it using the indifference relation induced by (cid:23) . We instead proceed as follows. Foreach T ∈ L ∞ N , define B x ( n, T ) = max { m ∈ N : ( x, m ) (cid:23) ( x, T ∗ n ) } Note that for fixed T , B x ( n, T ) is a non-negative super-additive sequence in n . This isbecause if ( x, m ) (cid:23) ( x, T ∗ n ) and ( x, m ) (cid:23) ( x, T ∗ n ), then applying stationarity twiceyields ( x, m + m ) (cid:23) ( x, T ∗ n + m ) (cid:23) ( x, T ∗ n + T ∗ n ) = ( x, T ∗ n + n ) . Note also that by monotonicity in time, B x ( n, T ) ≤ max[ T ∗ n ] = n max[ T ]. So we have awell-defined finite limit Φ x ( T ) = lim n →∞ n B x ( n, T ) . It is easy to see that Φ x is a monotone statistic. It is also super-additive because foreach n , ( x, m ) (cid:23) ( x, T ∗ n ) and ( x, m ) (cid:23) ( x, S ∗ n ) imply ( x, m + m ) (cid:23) ( x, ( T + S ) ∗ n ) by twoapplications of stationarity. Moreover, using B x ( n, T ) = min { m ∈ N : ( x, m ) ≺ ( x, T ∗ n ) } − , we can also show Φ x is sub-additive. Thus Φ x is a monotone additive statistic.We next show that ( x, T ) (cid:23) ( x, S ) if and only if Φ x ( S ) ≥ Φ x ( T ). Suppose Φ x ( S ) > Φ x ( T ) holds strictly, then by definition we have B x ( n, S ) > B x ( n, T ) for sufficiently large n . Thus, for this n , the integer m = B x ( n, S ) satisfies( x, T ∗ n ) (cid:31) ( x, m ) (cid:23) ( x, S ∗ n ) . This implies ( x, T ) (cid:31) ( x, S ), because by repeated application of stationarity ( x, S ) (cid:23) ( x, T )would imply ( x, S ∗ n ) (cid:23) ( x, T ∗ n ).It remains to show that Φ x ( S ) = Φ x ( T ) implies ( x, T ) ∼ ( x, S ). By symmetry itsuffices to show ( x, T ) (cid:23) ( x, S ). Let S ε be equal to S with probability 1 − ε , and equal tomax[ S ] + 1 with probability ε . Then Φ x ( S ε ) > Φ x ( S ) = Φ x ( T ), so that ( x, T ) (cid:31) ( x, S ε ) forevery ε >
0. By strong continuity, we thus obtain ( x, T ) (cid:23) ( x, S ) as desired.We now further show Φ x is strongly monotone. Suppose for contradiction that themeasure µ associated with Φ x puts mass at least N on a = ∞ , for some large positiveinteger N . Consider the random variable T ε which equals N with probability ε and equals0 otherwise. Note that Φ x ( T ε ) ≥ N max[ T ε ] = 1. Thus for any ε >
0, ( x, T ε ) (cid:22) ( x,
1) by38hat we showed above. But since T ε → x, (cid:22) ( x, µ puts mass at least N on a = −∞ , by considering the time lotteries N − T ε versus thedeterministic time N − x > x , such that ( x, T ) (cid:23) ( x, S ) if and only if Φ x ( T ) ≤ Φ x ( S ). What remains to be done isto relate the preferences for different rewards x . This is however another new difficultyrelative to the proof of Theorem 4. The issue is that we cannot directly reduce the timelottery ( x, T ) to the deterministic reward ( x, Φ x ( T )) by indifference, since the latter neednot be in discrete time.To address this problem, we introduce an auxiliary preference (cid:23) ∗ defined on the setof deterministic dates rewards R + × R + in continuous time. Specifically, consider any( x, t ) and ( y, s ), where x, y > t and s need not be integers. By the fact that Φ x , Φ y satisfy strong continuity, we can find integer-valued bounded random times T, S such thatΦ x ( T ) = t and Φ y ( S ) = s . We then define ( x, t ) (cid:23) ∗ ( y, s ) if and only if ( x, T ) (cid:23) ( y, S ).Since we have shown that ( x, T ) ∼ ( x, T ) whenever Φ x ( T ) = Φ x ( T ), this definition of (cid:23) ∗ does not depend on the specific choice of T and S . In addition, it is easy to see that (cid:23) ∗ iscomplete and transitive. We can further include the zero reward by defining ( x, t ) (cid:31) (0 , s )for any x > , t ) ∼ (0 , s ).Below we show that the preference (cid:23) ∗ satisfies the axioms in Fishburn and Rubinstein(1982). We introduce a key technical lemma that we prove at the end of this section: Lemma 15.
Let Φ and Ψ be two strongly monotone additive statistics defined on L ∞ N .Then for any real number d > , there exist two random variables D, D ∈ L ∞ N such that Φ( D ) − Φ( D ) = d = Ψ( D ) − Ψ( D ) . We use this lemma to prove the stationarity property of (cid:23) ∗ , namely ( x, t ) (cid:23) ∗ ( y, s ) ifand only if ( x, t + d ) (cid:23) ∗ ( y, s + d ). Let T, T , S, S ∈ L ∞ N satisfy Φ x ( T ) = t , Φ x ( T ) = t + d ,Φ y ( S ) = s , Φ y ( S ) = s + d . Also let D, D ∈ L ∞ N be given by Lemma 15, such thatΦ x ( D ) − Φ x ( D ) = d = Φ y ( D ) − Φ y ( D ) . Suppose ( x, t ) (cid:23) ∗ ( y, s ), then by definition ( x, T ) (cid:23) ( y, S ). This implies, by stationarity of (cid:23) , that ( x, T + D ) (cid:23) ( y, S + D ) . Now observe thatΦ x ( T + D ) = Φ x ( T ) + Φ x ( D ) = t + d + Φ x ( D ) = Φ x ( T + D ) . Thus ( x, T + D ) ∼ ( x, T + D ) and likewise ( y, S + D ) ∼ ( y, S + D ). It follows that( x, T + D ) (cid:23) ( y, S + D ) .
39y stationarity of (cid:23) again, we conclude that ( x, T ) (cid:23) ( y, S ) and thus ( x, t + d ) (cid:23) ∗ ( y, s + d ).Moreover, if we have the strict preference ( x, t ) (cid:31) ∗ ( y, s ) to begin with, then the abovesteps and the conclusion ( x, t + d ) (cid:31) ∗ ( y, s + d ) are also strict. This proves the stationarityof (cid:23) ∗ . We now use stationarity to show (cid:23) ∗ is monotone in money. Suppose x > y >
0, then( x, (cid:31) ( y,
0) when viewed as discrete time lotteries, and by definition ( x, (cid:31) ∗ ( y, x, t ) (cid:31) ∗ ( y, t ) for every t ≥ x ( S ) = s > t = Φ x ( T ). Then ( x, T ) (cid:31) ( x, S ) bythe fact that Φ x represents the preference (cid:23) restricted to the reward x . So by definition( x, t ) (cid:31) ∗ ( x, s ) also holds.It remains to check (cid:23) ∗ is continuous in the sense that if ( x n , t n ) → ( x, t ) and ( x n , t n ) (cid:23) ∗ ( y, s ) for every n , then ( x, t ) (cid:23) ∗ ( y, s ) (and that the same holds for the preferences reversed).To show this, note that t n < b t c + 1 for every large n . By strong monotonicity (thuscontinuity) of Φ x , we can find a binary integer random variable T n supported on 0 and b t c + 1 such that Φ x ( T n ) = t n . Passing to a sub-sequence if necessary, we can assume T n has a limit T . Since ( x n , t n ) (cid:23) ∗ ( y, s ), we know by definition that ( x n , T n ) (cid:23) ( y, S ) forany S with Φ y ( S ) = s . Thus by strong continuity of (cid:23) , we deduce ( x, T ) (cid:23) ( y, S ). SinceΦ x ( T ) = lim Φ x ( T n ) = lim t n = t , we have ( x, t ) (cid:23) ∗ ( y, s ) as desired.Hence we can apply Theorem 2 in Fishburn and Rubinstein (1982) to deduce that( x, t ) (cid:23) ∗ ( y, s ) if and only if u ( x ) · e − rt ≥ u ( y ) · e − rs , for some continuous and strictly increasing function u : R + → R + with u (0) = 0. Since bydefinition ( x, T ) (cid:23) ( y, S ) if and only if ( x, Φ x ( T )) (cid:23) ∗ ( y, Φ y ( S )), we obtain( x, T ) (cid:23) ( y, S ) if and only if u ( x ) · e − r Φ x ( T ) ≥ u ( y ) · e − r Φ y ( S ) . Once we have this representation, for any x, y >
T, S ∈ L ∞ N such that u ( x ) · e − r Φ x ( T ) = u ( y ) · e − r Φ y ( S ) , so ( x, T ) ∼ ( y, S ). Then for any independent D , we alsohave ( x, T + D ) ∼ ( y, S + D ) so that u ( x ) · e − r Φ x ( T + D ) = u ( y ) · e − r Φ y ( S + D ) . Dividing the twoequalities thus yields Φ x ( D ) = Φ y ( D ) for every D . We can therefore write Φ x ( T ) = Φ( T )for a single strongly monotone additive statistic Φ, which completes the proof. Proof of Lemma 15.
Suppose for the sake of contradiction that the result is not true. Weclaim there cannot exist
X, Y, X , Y ∈ L ∞ N such that Φ( Y ) − Φ( X ) = d < Ψ( Y ) − Ψ( X )and Φ( Y ) − Φ( X ) = d > Ψ( Y ) − Ψ( X ). Indeed, given such random variables, wemay add a large constant to X , Y so that X > X and Y > Y , without affecting theassumption. Then as λ varies in [0 , X λ X ) increases continuously This proof would be a little simpler if there exists D such that Φ x ( D ) = Φ y ( D ) = d , which would be astronger statement than Lemma 15. But such integer-valued D might not exist when d is not an integer,and Φ x is larger than Φ y in the sense of Proposition 4. λ (where X λ X ∈ L ∞ N is the ( λ, − λ )-mixture between X and X ). Likewise Φ( X λ X )increases continuously in λ . So for any λ ∈ [0 , h ( λ ) ∈ [0 ,
1] suchthat Φ( Y h ( λ ) Y ) − Φ( X λ X ) = d . This function h ( λ ) is strictly increasing and continuous,and satisfies h (0) = 0 , h (1) = 1. Note that Ψ( Y h ( λ ) Y ) − Ψ( X λ X ) is larger than d when λ = 0, but smaller than d when λ = 1. Thus by continuity, there exists λ such thatΨ( Y h ( λ ) Y ) − Ψ( X λ X ) = d = Φ( Y h ( λ ) Y ) − Φ( X λ X ).Hence, for the lemma to fail, the only possibility is that Ψ( Y ) − Ψ( X ) is always larger(or always smaller) than d whenever Φ( Y ) − Φ( X ) = d . Below we assume Ψ( Y ) − Ψ( X ) > d ,but the opposite case can be symmetrically handled. Choose any positive integer k > d ,and let Y ∗ = k λ { , k } such thatΦ( Y ∗ ) = d . Then Ψ( Y ∗ ) > d by assumption, and we can assume Ψ( Y ∗ ) = d + η for some η >
0. This Y ∗ and η will be fixed in the subsequent analysis.Now take any positive integer m > k . We can define a continuum of random variables X ε ∈ L ∞ N for ε ∈ [0 , ε ∈ [0 , λ ] we define X ε to be equal to k withprobability ε and equal to 0 with probability 1 − ε . And for ε ∈ [ λ, X ε tobe equal to m with probability ε − λ − λ , equal to k with probability λ (1 − ε )1 − λ and equal to 0with probability 1 − ε . The important thing here is that as ε increases, X ε increases infirst-order stochastic dominance in a continuous way. Thus Φ( X ε ) and Ψ( X ε ) increasecontinuously with ε . In addition, note that X = 0, X λ = Y ∗ and X = m .Let n ≤ md be any positive integer. Then we can define ε , ε , · · · , ε n by the equationsΦ( X ε j ) = j · d for every 0 ≤ j ≤ n . It is easy to see0 = ε < λ = ε < · · · < ε n ≤ . For 1 ≤ j ≤ n , we have Φ( X ε j ) − Φ( X ε j − ) = d . So by assumption Ψ( X ε j ) − Ψ( X ε j − ) > d .Moreover when j = 1 we in fact have Ψ( X ε j ) − Ψ( X ε j − ) = Ψ( Y ∗ ) − Ψ(0) = d + η . Summingacross j , we thus obtain m = Ψ( X ) − Ψ( X ) ≥ Ψ( X ε n ) − Ψ( X ε ) = n X j =1 (cid:16) Ψ( X ε j ) − Ψ( X ε j − ) (cid:17) ≥ nd + η. But we now have a contradiction because the inequality m ≥ nd + η cannot hold for allsufficiently large integers m and n that satisfy m ≥ nd . To see this, observe that when d is a rational number, we can choose m, n so that m = nd . In that case the inequality m ≥ nd + η clearly fails. If instead d is an irrational number, then it is well known that thefractional part of nd can be arbitrarily close to one (as implied by the “equidistribution”property). Again we can find large integers m and n such that nd + η > m ≥ nd . Thus acontradiction obtains either way, completing the proof.41 .3 Proof of Proposition 1 Clearly, a preference with the representation f ( x, T ) = u ( x ) · e − r Φ( T ) of Theorem 4 satisfiesbetweenness if and only ifΦ( T ) = Φ( S ) implies Φ( T λ S ) = Φ( S ) for all λ ∈ (0 , . In this case, we say that Φ satisfies betweenness. Thus, to prove the current proposition itsuffices to show that any Φ satisfying betweenness has one of the following forms:1. Φ( T ) = K a ( T ) for a ∈ ¯ R .2. Φ( T ) = β min[ T ] + (1 − β ) max[ T ] for some β ∈ (0 , T ) = − a a − a K a ( T ) + a a − a K a ( T ) = log E [ e a T ] − log E [ e a T ] a − a for some a < < a .We first show the “if” direction. Specifically, when Φ( T ) = K a ( T ) for some fixed a ∈ R =0 , then Φ( T ) = Φ( S ) implies E h e aT i = E h e aS i . It follows that E h e aT λ S i = λ E h e aT i + (1 − λ ) E h e aS i = E h e aS i , and so Φ( T λ S ) = Φ( S ). It is straightforward to checkthat the same is true when a = 0 or ±∞ . So every K a ( T ) satisfies betweenness.We next show Φ( T ) = β min[ T ] + (1 − β ) max[ T ] also satisfies betweenness for any β ∈ (0 , β min[ T ] + (1 − β ) max[ T ] = β min[ S ] + (1 − β ) max[ S ] . Then either min[ T ] ≤ min[ S ] , max[ T ] ≥ max[ S ] or the other way around. In the formercase T λ S has the same minimum and maximum as T , whereas in the latter case it has thesame minimum and maximum as S . Either way, Φ( T λ S ) = Φ( T ) = Φ( S ) holds.We then consider Φ( T ) = (cid:16) log E h e a T i − log E h e a T i(cid:17) / ( a − a ) for some a < < a .If Φ( T ) = Φ( S ), then log E h e a T i − log E h e a T i = log E h e a S i − log E h e a S i , which isequivalent to E h e a T i E [e a S ] = E h e a T i E [e a S ] . Since E h e aT λ S i = λ E h e aT i + (1 − λ ) E h e aS i for every a , it is not difficult to see that theabove ratio equality continues to hold with T replaced by T λ S . Hence Φ( T λ S ) = Φ( S ),and betweenness is satisfied.Turning to the “only if” direction. We will characterize any monotone additive statisticΦ that satisfies the weak form of betweenness, i.e., Φ( T ) = c implies Φ( T λ c ) = c when c isa constant. The following lemma is key to the argument: Lemma 16.
Suppose Φ( T ) = R R K a ( T ) d µ ( a ) has the property that Φ( T ) = c implies theinequality Φ( T λ c ) ≤ c . Then the measure µ restricted to [0 , ∞ ] is either the zero measure,or it is supported on a single point. roof. It suffices to show that if µ puts any mass on (0 , ∞ ], then that mass is supportedon a single point and µ ( { } ) = 0. For this let N > µ ; that is, N = min { x : µ (( x, ∞ ]) = 0 } . We allow N = ∞ when the support of µ is unbounded from above, or when µ has a non-zero mass at ∞ . For any positive realnumber b < N , consider the same random variable X n,b as in the proof of Lemma 5, givenby P [ X n,b = n ] = e − bn P [ X n,b = 0] = 1 − e − bn . As shown in the proof of Lemma 5, n K a ( X n,b ) is uniformly bounded in [0 , n →∞ n K a ( X n,b ) = ( a − b ) + a . Thus if we let c n = Φ( X n,b ), then by the Dominated Convergence Theorem,lim n →∞ c n n = lim n →∞ n Φ( X n,b ) = lim n →∞ Z ¯ R n K a ( X n,b ) d µ ( a ) = Z ( b, ∞ ] a − ba d µ ( a ) . Denote γ = R ( b, ∞ ] a − ba d µ ( a ). This number γ is strictly positive because b < N implies µ (( b, ∞ ]) >
0. We can also assume γ <
1, since otherwise µ must be the point mass at ∞ .Now, as Φ( X n,b ) = c n we know by assumption that Φ( Y n,b ) ≤ c n for each n , where Y n,b is the mixture between X n,b and the constant c n (in what follows λ is fixed as n varies): P [ Y n,b = n ] = λ e − bn P [ Y n,b = 0] = λ (1 − e − bn ) P [ Y n,b = c n ] = 1 − λ. Using lim n →∞ c n /n = γ , we havelim n →∞ n K a ( Y n,b ) = lim n →∞ n a log h λ (cid:16) − e − bn + e ( a − b ) n (cid:17) + (1 − λ )e a · c n i = a < − λ ) γ if a = 0 γ if 0 < a < b − γa − ba if a ≥ b − γ . Note that the cutoff point a = b − γ is where a − b = aγ . When a is smaller than this,the dominant term in the bracketed sum above is (1 − λ )e a · c n . Whereas for larger a , thedominant term becomes λ e ( a − b ) · n . 43rucially, lim n →∞ n K a ( Y n,b ) ≥ ( a − b ) + a holds for every a , with strict inequality for a ∈ [0 , b − γ ). Thus again by the Dominated Convergence Theorem,lim n →∞ c n n ≥ lim n →∞ n Φ( Y n,b ) = lim n →∞ Z ¯ R n K a ( Y n,b ) d µ ( a ) ≥ Z ( b, ∞ ] a − ba d µ ( a ) . But we know that the far left is equal to the far right. So both inequalities hold equal, andin particular lim n →∞ n K a ( Y n,b ) = ( a − b ) + a holds µ -almost surely.As discussed, lim n →∞ n K a ( Y n,b ) > ( a − b ) + a for any a ∈ [0 , b − γ ). So we can concludethat µ ([0 , b − γ )) = 0. This must hold for any b ∈ (0 , N ) and corresponding γ . Letting b arbitrarily close to N thus yields µ ([0 , N )) = 0 (since b − γ > b ). It follows that whenrestricted to [0 , ∞ ] the measure µ is concentrated at the single point N , as we desire toshow.From this lemma, we know that if Φ satisfies the weak form of betweenness, then itsassociated measure µ can only be supported on one point in all of [0 , ∞ ]. By a symmetricargument, µ also has at most one point support in all of [ −∞ , µ = δ a forsome a ∈ ¯ R , or µ is supported on two points { a , a } with a < < a . We study thelatter case below.So suppose Φ( T ) = βK a ( T ) + (1 − β ) K a ( T ) for some β ∈ (0 , a = −∞ while a < ∞ , then Φ( T ) = β min[ T ] + (1 − β ) K a ( T ). Take any non-constant T and let c denote Φ( T ). Note that since K a ( T ) > min[ T ], c = β min[ T ] + (1 − β ) K a ( T ) lies strictlybetween min[ T ] and K a ( T ). Consider the mixture T λ c , then min[ T λ c ] = min[ T ], whereas K a ( T λ c ) = 1 a log (cid:16) λ E h e a T i + (1 − λ )e a c (cid:17) < a log E h e a T i = K a ( T ) , where the inequality uses c < K a ( T ) = a log E h e a T i and a >
0. We thus deduce thatΦ( T λ c ) = β min[ T λ c ] + (1 − β ) K a ( T λ c ) < β min[ T ] + (1 − β ) K a ( T ) = c, contradicting the betweenness assumption. A symmetric argument rules out the possibilitythat a > −∞ while a = ∞ .It remains to consider a ∈ ( −∞ ,
0) and a ∈ (0 , ∞ ). Here we just need to show that β = − a a − a . Let us again take an arbitrary non-constant T , and let c = Φ( T ) = βa log E h e a T i + 1 − βa log E h e a T i . For an arbitrary λ ∈ [0 , c = Φ( T λ c ) = βa log E h λ e a T + (1 − λ )e a c i + 1 − βa log E h λ e a T + (1 − λ )e a c i . (14)44ince (14) holds for every λ , we can differentiate it with respect to λ to obtain0 = β ( E h e a T i − e a c ) a E [ λ e a T + (1 − λ )e a c ] + (1 − β )( E h e a T i − e a c ) a E [ λ e a T + (1 − λ )e a c ] . Plugging in λ = 0 and λ = 1 gives, respectively, β ( E h e a T i − e a c ) a e a c = − (1 − β )( E h e a T i − e a c ) a e a c . (15) β ( E h e a T i − e a c ) a E [e a T ] = − (1 − β )( E h e a T i − e a c ) a E [e a T ] . (16)Since c = βK a ( T ) + (1 − β ) K a ( T ), the fact that K a ( T ) > K a ( T ) implies c is strictlybetween K a ( T ) and K a ( T ). Thus, using a < < a we deduce e a c < E h e a T i ande a c < E h e a T i .We can therefore divide (15) by (16) to obtain E h e a T i e a c = E h e a T i e a c . Plugging this back to (15), we conclude βa = − − βa , so β = − a a − a as we desire to show. D.4 Proof of Proposition 2
The “if” direction is straightforward, so we focus on the “only if.” Note that for therepresentation given by Theorem 4, the independence axiom requires Φ( T λ R ) = Φ( S λ R )whenever Φ( T ) = Φ( S ). This is stronger than the betweenness axiom, so we know fromProposition 1 that (cid:23) must be represented by f ( x, T ) = u ( x ) · e − r Φ( T ) where Φ takes one ofthe following three forms:(i) Φ( T ) = K a ( T ) for some a ∈ ¯ R , or(ii) Φ( T ) = β min[ T ] + (1 − β ) max[ T ] for some β ∈ (0 , T ) = − a a − a K a ( T ) + a a − a K a ( T ) = log E [ e a T ] − log E [ e a T ] a − a for some a < < a .We just need to show that form (ii) and (iii) violate the independence axiom. Suppose Φtakes form (ii). Let S = 1 − β be a constant, and let T be distributed uniformly on { , } .Then Φ( T ) = Φ( S ) = 1 − β , but for any λ ∈ (0 , T λ
1) = 1 − β < β (1 − β ) + 1 − β = Φ( S λ . This contradicts the independence axiom. 45ext suppose Φ takes form (iii). Denote β = − a a − a ∈ (0 , T ) = βK a ( T ) + (1 − β ) K a ( T ) . We choose S and T such that Φ( T ) > Φ( S ) but K a ( T ) < K a ( S ). For example, let S = 1,and let T be supported on { , k } , with P [ T = n ] = 1 /k . Then K a ( T ) = 1 a log E h − /k + e ak /k i . For k tending to infinity, K a ( T ) tends to zero if a <
0, and to infinity if a >
0. Hence, for k large enough, S and T will have the desired property.Now, let R = n . Then K a ( S λ n ) = 1 a log E h λ E h e aS i + (1 − λ )e an i K a ( T λ n ) = 1 a log E h λ E h e aT i + (1 − λ )e an i and so K a ( S λ n ) − K a ( T λ n ) = 1 a log λ E h e aS i + (1 − λ )e an λ E [e aT ] + (1 − λ )e an . It easily follows that for a > n →∞ K a ( S λ n ) − K a ( T λ n ) = 0 , whereas for a <
0, lim n →∞ K a ( S λ n ) − K a ( T λ n ) = K a ( S ) − K a ( T ) . Thus, as n tends to infinity,lim n Φ( S λ n ) − Φ( T λ n ) = lim n β [ K a ( S λ n ) − K a ( T λ n )] + (1 − β ) [ K a ( S λ n ) − K a ( T λ n )]= β [ K a ( S ) − K a ( T )] > . Therefore, for n large enough, we have found S and T such that Φ( T ) > Φ( S ) butΦ( T λ n ) < Φ( S λ n ).If we let c = Φ( T ) − Φ( S ) > S = S + c , then Φ( T ) = Φ( S ) andΦ( T λ n ) < Φ( S λ n ), where the latter follows from monotonicity of Φ. This contradicts theindependence axiom and completes the proof of Proposition 2.46 .5 Proof of Proposition 3 We show that the result follows from Proposition 4, which we prove in the next section.Indeed, to characterize risk-averse preferences, it is equivalent to characterize those measures µ that are “more risk-averse than” the measure ν that is a point mass at zero (since this ν corresponds to Φ ν ( T ) ≡ T ). When ν = δ , the condition (i) in Proposition 4 is triviallysatisfied because R [ b, ∞ ] a − ba d ν ( a ) = 0 whereas R [ b, ∞ ] a − ba d µ ( a ) ≥
0. On the other hand,condition (ii) requires R [ −∞ ,b ] a − ba d µ ( a ) ≤ b <
0. Since the integrand a − ba is strictly positive for a ∈ [ −∞ , b ), this implies µ ([ −∞ , b )) = 0, which further implies µ ([ −∞ , b < µ is supported on [0 , ∞ ] as we desire toshow.Symmetrically, a measure ν is risk-seeking if and only if the measure µ = δ is morerisk-averse than ν . In this case condition (ii) in Proposition 4 is trivial whereas condition(i) reduces to ν being supported on [ −∞ , D.6 Proof of Proposition 4
We first show that conditions (i) and (ii) are necessary for R ¯ R K a ( T ) d µ ( a ) ≥ R ¯ R K a ( T ) d ν ( a )to hold for every T . This part of the argument closely follows the proof of Lemma 5 above.Specifically, by considering the same random variables X n,b as defined there, we havethe key equation (9). Since the limit on the left-hand side is larger for µ than for ν , weconclude that for every b > R [ b, ∞ ] a − ba d µ ( a ) on the right-hand side must be larger thanthe corresponding integral for ν . Thus condition (i) holds, and an analogous argumentshows condition (ii) also holds.To complete the proof, it remains to show that when conditions (i) and (ii) are satisfied, Z ¯ R K a ( T ) d µ ( a ) ≥ Z ¯ R K a ( T ) d ν ( a )holds for every T . Since µ and ν are both probability measures, we can subtract E [ T ]from both sides and arrive at the equivalent inequality Z ¯ R =0 ( K a ( T ) − E [ T ]) d µ ( a ) ≥ Z ¯ R =0 ( K a ( T ) − E [ T ]) d ν ( a ) . (17)Note that we can exclude a = 0 from the range of integration because K a ( T ) = E [ T ] there.Below we show that condition (i) implies Z (0 , ∞ ] ( K a ( T ) − E [ T ]) d µ ( a ) ≥ Z (0 , ∞ ] ( K a ( T ) − E [ T ]) d ν ( a ) . (18)Similarly, condition (ii) gives the same inequality when the range of integration is instead[ −∞ , K T ( a ) = a · K a ( T ) = log E h e aT i be the cumulant generatingfunction of T . It is well known that K T ( a ) is convex in a , with K T (0) = E [ T ] and47im a →∞ K T ( a ) = max[ T ]. Then the integral on the left-hand side of (18) can be calculatedas follows: Z (0 , ∞ ] ( K a ( T ) − E [ T ]) d µ ( a ) = Z (0 , ∞ ) ( K a ( T ) − E [ T ]) d µ ( a ) + (max[ T ] − E [ T ]) · µ ( {∞} )= Z (0 , ∞ ) ( K T ( a ) − a E [ T ]) d µ ( a ) a + (max[ T ] − E [ T ]) · µ ( {∞} )Note that since the function g ( a ) = K T ( a ) − a E [ T ] satisfies g (0) = g (0) = 0, it can bewritten as g ( a ) = Z a g ( t ) d t = Z a Z t g ( b ) d b d t = Z a g ( b ) · ( a − b ) d b. Plugging back to the previous identity, we obtain Z (0 , ∞ ] ( K a ( T ) − E [ T ]) d µ ( a ) = Z (0 , ∞ ) Z a K T ( b ) · ( a − b ) d b d µ ( a ) a + (max[ T ] − E [ T ]) · µ ( {∞} )= Z ∞ K T ( b ) Z [ b, ∞ ) ( a − b ) d µ ( a ) a d b + ( K T ( ∞ ) − K T (0)) · µ ( {∞} )= Z ∞ K T ( b ) Z [ b, ∞ ) a − ba d µ ( a ) d b + Z ∞ K T ( b ) · µ ( {∞} ) d b = Z ∞ K T ( b ) Z [ b, ∞ ] a − ba d µ ( a ) d b, where the last step uses a − ba = 1 when a = ∞ > b .The above identity also holds when µ is replaced by ν . It is then immediate to see that(18) follows from condition (i) and the fact that K T ( b ) ≥ b . This completes theproof. D.7 Proof of Theorem 5
When (cid:23) is represented by a monotone additive statistic Φ, we can easily check thatthe axioms are satisfied. For the Rabin and Weizsäcker axiom, note that X (cid:31) Y and X (cid:31) Y imply Φ( X ) > Φ( Y ) and Φ( X ) > Φ( Y ). By additivity of Φ, we thus haveΦ( X + X ) > Φ( Y + Y ). It follows from monotonicity of Φ that X + X cannot befirst-order stochastically dominated by Y + Y . The archimedeanity and responsivenessaxioms are straightforward.Turning to the “only if” direction, we suppose (cid:23) satisfies the axioms. We first showthat for any gamble X and any ε > X ] + ε (cid:31) X (cid:31) min[ X ] − ε. To see why, suppose for contradiction that X is weakly preferred to max[ X ] + ε (theother case can be handled similarly). Then we obtain a contradiction to the Rabin andWeizsäcker axiom by observing that X (cid:31) max[ X ] + ε , ε (cid:31) X + ε < max[ X ] + ε + 0.48iven these upper and lower bounds for X , we can define Φ( X ) = sup { c ∈ R : c (cid:22) X } ,which is well-defined and finite. By definition of the supremum and responsiveness, for any ε > X ) − ε ≺ X ≺ Φ( X ) + ε . Thus by archimideanity, Φ( X ) ∼ X is the(unique) certainty equivalent of X . Clearly, Φ is a statistic.It remains to show that Φ( X ) is monotone and additive. For this we first showΦ( X + c ) = Φ( X ) + c for any constant c . Suppose not, and Φ( X + c ) = Φ( X ) + c forsome c > c (the case of c < c is similar). Let ε ∈ (0 , c − c ) be a small positive number.Then by responsiveness, X = X + c + ε is strictly preferred to X + c and thus preferredto the constant Y = Φ( X ) + c . On the other hand, X = Φ( X ) + ε is strictly preferred toΦ( X ) and thus preferred to Y = X . But X + X = X + Φ( X ) + c + 2 ε is stochasticallydominated by Y + Y = X + Φ( X ) + c , contradicting the Rabin and Weizsäcker axiom.Using this result, and the archimedeanity axiom, we next show the following continuityproperty: whenever X (cid:31) Y , there exists ε > X (cid:31) Y + ε also holds. Indeed,suppose for contradiction that Y + ε (cid:23) X for every ε >
0. Then by responsiveness, wein fact have the strict preference Y + ε (cid:31) X . Thus Y + ε (cid:31) X (cid:31) Y (cid:31) Y − ε . Since Y ± ε ∼ Φ( Y ) ± ε , we deduce Φ( Y ) + ε (cid:31) X (cid:31) Φ( Y ) − ε for every ε >
0. This implies X ∼ Φ( Y ) ∼ Y by archimedeanity, which is a contradiction.We now show X ∼ Y implies X + Z ∼ Y + Z for any independent Z . Suppose forcontradiction that X + Z (cid:31) Y + Z . Then we can find ε > X + Z (cid:31) Y + Z + 2 ε .By responsiveness, it also holds that Y + ε (cid:31) Y ∼ X . But the sum X + Z + Y + ε isstochastically dominated by Y + Z + 2 ε + X , contradicting the Rabin and Weizsäckeraxiom.Therefore, from X ∼ Φ( X ) and Y ∼ Φ( Y ) we can apply the preceding result twice toobtain X + Y ∼ Φ( X ) + Y ∼ Φ( X ) + Φ( Y ), so that Φ( X + Y ) = Φ( X ) + Φ( Y ). Finally,we show Φ( · ) is monotone with respect to first-order stochastic dominance. Considerany Y ≥ X , and suppose for contradiction that X (cid:31) Y . Then there exists ε > X (cid:31) Y + 2 ε . This leads to a contradiction since X (cid:31) Y + 2 ε , ε (cid:31)
0, but X + ε isstochastically dominated by Y + 2 ε + 0.This completes the proof that the certainty equivalent Φ( X ) is a monotone additivestatistic. Hence the theorem. E Sub- and Super-additive Statistics
In cases where the additivity assumption may seem too strong, we can weaken it to sub-or super-additivity as we describe in this section. Say a statistic Φ is sub-additive ifΦ( X + Y ) ≤ Φ( X ) + Φ( Y ) whenever X, Y are independent bounded random variables,and super-additive if the reverse inequality holds. Say Φ is homogeneous if the equalityΦ( X + X ) = Φ( X ) + Φ( X ) holds when X and X are independent and furthermore identically distributed . These properties are all implied by additivity.49he following result characterizes homogeneous and sub-additive (or super-additive)statistics on L ∞ : Theorem 6.
Φ : L ∞ → R is monotone, homogeneous and sub-additive if and only if thereexists a nonempty closed convex set C of Borel probability measures on ¯ R , such that forevery X ∈ L ∞ it holds that Φ( X ) = max µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . Likewise, Φ is monotone, homogeneous and super-additive if and only if Φ( X ) = min µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . We use a few examples to illustrate that homogeneity and sub-additivity (or super-additivity) are both important for such representations. An example of a monotone statisticthat is super-additive but not homogeneous isΦ( X ) = log (cid:18) h e min[ X ] + e max[ X ] i(cid:19) . The super-additivity condition Φ( X + Y ) ≥ Φ( X ) + Φ( Y ) is equivalent to2 h e min[ X ]+min[ Y ] + e max[ X ]+max[ Y ] i ≥ h e min[ X ] + e max[ X ] i · h e min[ Y ] + e max[ Y ] i , which reduces to (e max[ X ] − e min[ X ] ) · (e max[ Y ] − e min[ Y ] ) ≥
0. The same argument showsthat min[ X ] and max[ X ] can be substituted with any pair of monotone additive statisticsΨ( X ) and Ψ ( X ) satisfying Ψ ≤ Ψ (see Proposition 4). The resulting Φ would also bemonotone, super-additive but not homogeneous.Note that if Φ( X ) is monotone and super-additive, then − Φ( − X ) is monotone and sub-additive. In this way we also have an example of a monotone statistic that is sub-additivebut not homogeneous.As for an example of a monotone statistic that is homogeneous but not sub-additive orsuper-additive, we considerΦ( X ) = min[ X ] if min[ X ] + max[ X ] ≤ X ] otherwise. The representation in the super-additive case is reminiscent of the “cautious expected utility” represen-tation of Cerreia-Vioglio et al. (2015), which evaluates a gamble X according to its minimum certaintyequivalent across a family of utility functions. The difference is that our agent potentially takes theminimum across averages of certainty equivalents. In fact, since CARA certainty equivalents are increasingin the level of risk seeking, taking the minimum across these certainty equivalents (and not their averages)in our setting would reduce to a single CARA certainty equivalent. X ] and max[ X ] are homogeneous. To see itis monotone, we will show Φ( Y ) ≥ Φ( X ) whenever Y ≥ X . If Φ( X ) = min[ X ] thenΦ( Y ) ≥ min[ Y ] ≥ min[ X ] holds. Otherwise Φ( X ) = max[ X ] and min[ X ] + max[ X ] > Y ] ≥ max[ X ] and min[ Y ] ≥ min[ X ], we also have min[ Y ] + max[ Y ] >
0. HenceΦ( Y ) = max[ Y ] ≥ Φ( X ) also holds.In addition, this statistic Φ is neither sub-additive nor super-additive. To see this,note that if X, Y are non-constant random variables, and if min[ X ] + max[ X ] ≤ < min[ Y ] + max[ Y ], then whether Φ( X + Y ) = min[ X + Y ] or max[ X + Y ] depends on thesign of min[ X ] + max[ X ] + min[ Y ] + max[ Y ]. In the former case Φ( X + Y ) = min[ X + Y ] ≤ min[ X ] + max[ Y ] = Φ( X ) + Φ( Y ), whereas in the latter case Φ( X + Y ) ≥ Φ( X ) + Φ( Y ).Both situations can occur.Interestingly, the results in Theorem 6 need to be modified when we consider the smallerdomain L ∞ + of non-negative bounded random variables. This is elaborated below: Proposition 7.
Φ : L ∞ + → R is monotone, homogeneous and sub-additive if and onlyif there exists a nonempty closed convex set C of Borel sub-probability measures on ¯ R satisfying max µ ∈ C | µ | = 1 , such that for every X ∈ L ∞ + it holds that Φ( X ) = max µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . The key distinction from Theorem 6 is that the maximum here can now be taken overa set of sub-probability measures . This possibility is ruled out in the case of all boundedrandom variables, since in that case we require Φ( c ) = c also for negative constants c .One might suspect that the analogue of Proposition 7 holds for monotone, homogeneousand super-additive statistics on L ∞ + , with minimization over super-probability measures.This is not quite true, as the following example suggests: for every X ∈ L ∞ + , Φ( X ) = 0if min[ X ] = 0 and Φ( X ) = max[ X ] if min[ X ] >
0. This statistic is readily checked to bemonotone, homogeneous and super-additive.However, it cannot be written as the form inf µ ∈ C R ¯ R K a ( X ) d µ ( a ), and the key issue isa failure of upper-semicontinuity (henceforce usc) . Specifically, for any finite measure µ ,the integral R ¯ R K a ( X ) d µ ( a ) is usc with respect to X in the sense that Z ¯ R K a ( X ) d µ ( a ) ≥ lim ε → + Z ¯ R K a ( X + ε ) d µ ( a ) . In fact we have equality since the reverse inequality always holds. It is well known thatthe infimum of a family of usc functions is also usc. But for the statistic Φ defined above,if X is the Bernoulli random variable that equals 0 and 1 with equal probabilities. ThenΦ( X ) = 0 whereas Φ( X + ε ) = 1 + ε . So Φ( X ) < lim ε → + Φ( X + ε ) and this Φ is not usc.In what follows, we define a statistic Φ to be usc if for every X in the domain,Φ( X ) = lim ε → + Φ( X + ε ) . X ) ≥ lim ε → + Φ( X + ε ).Note also that usc was automatically satisfied when the domain was L ∞ , in which casesuper-additivity gives Φ( X ) ≥ Φ( X + ε ) + Φ( − ε ) = Φ( X + ε ) − ε. It was also satisfied under sub-additivity, since in that caseΦ( X ) ≥ Φ( X + ε ) − Φ( ε ) = Φ( X + ε ) − ε. So the combination of super-additivity and the smaller domain L ∞ + is where we need toadditionally assume usc.The next result shows usc is exactly what we need to restore the representation: Proposition 8.
Φ : L ∞ + → R is monotone, homogeneous, super-additive and upper-semicontinuous if and only if there exists a nonempty closed convex set C of finite Borelsuper-probability measures on ¯ R satisfying min µ ∈ C | µ | = 1 , such that for every X ∈ L ∞ + itholds that Φ( X ) = inf µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . We point out that usc is not sufficient to ensure the inf above is achieved as min. Thereason is that the set of super-probability measures may contain measures with arbitrarilylarge total mass, so (sequential) compactness can be lost. To get a sharper result we needa stronger continuity notion, which we discuss in §E.4.In the following sections we present the proofs for the above results.
E.1 Proof of Theorem 6
When the domain is all bounded random variables, it is sufficient to focus on the caseof sub-additivity. This is because if Φ is monotone, homogeneous and super-additive,then Ψ( X ) = − Φ( − X ) is monotone, homogeneous and sub-additive. So the result forsuper-additivity can be immediately deduced from the result for sub-additivity. We willalso omit the proof for the “if” direction of the theorem, which is straightforward.Below we suppose Φ is sub-additive. For each random variable X and positive integer n , denote by X ∗ n the random variable that is the sum of n i.i.d. copies of X . Repeatedlyapplying sub-additivity, we have Φ( X ∗ n ) ≤ n Φ( X ) for each n , and equality holds when n is a power of two by homogeneity. Thus, for each n , if we choose any m with 2 m > n , thenby sub-additivity again2 m Φ( X ) = Φ( X ∗ m ) ≤ Φ( X ∗ n ) + Φ( X ∗ (2 m − n ) ) ≤ n Φ( X ) + (2 m − n )Φ( X ) . X ∗ n ) = n Φ( X ) , ∀ n ∈ N + . This stronger property explains why we call Φ( X ∗ ) = 2Φ( X ) homogeneity.The following lemma generalizes the key Lemma 1: Lemma 17.
Let Φ be a monotone, homogeneous and sub-additive statistic defined on L ∞ or L ∞ + . If K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R then Φ( X ) ≥ Φ( Y ) .Proof. It suffices to show Φ( X + 2 ε ) ≥ Φ( Y ) for any ε >
0, which would imply Φ( X ) + 2 ε ≥ Φ( Y ) by sub-additivity. Denoting ˜ X = X + ε , then K a ( ˜ X ) = K a ( X ) + ε > K a ( Y ) forevery a ∈ ¯ R . Thus by Theorem 3, there exists a bounded random variable Z such that˜ X + Z ≥ Y + Z. Since first-order stochastic dominance is preserved under adding an independent randomvariable, we have ˜ X + ˜ X + Z ≥ ˜ X + Y + Z ≥ Y + Y + Z, where ˜ X , ˜ X are i.i.d. copies of ˜ X and similarly for Y , Y .Iterating this procedure, we obtain that for each positive integer n ,˜ X ∗ n + Z ≥ Y ∗ n + Z. Since N ≥ Z ≥ − N , we further have˜ X ∗ n + N ≥ Y ∗ n + ( − N ) , or equivalently ( X + ε ) ∗ n + 2 N ≥ Y ∗ n . Now, if we choose n so large that εn ≥ N , then the above implies( X + 2 ε ) ∗ n ≥ ( X + ε ) ∗ n + 2 N ≥ Y ∗ n . Thus Φ( X + 2 ε ) ≥ Φ( Y ) follows from the monotonicity and homogeneity of Φ.Given Lemma 17, we can follow the proof of Theorem 1 and view Φ( X ) as a functional F ( K X ), which has the following five properties:1. constants: F ( c ) = c for every constant function c ;2. monotonicity: K X ≥ K Y implies F ( K X ) ≥ F ( K Y );3. homogeneity: F ( nK X ) = nF ( K X ) , ∀ n ∈ N + ;53. sub-additivity: F ( K X + K Y ) ≤ F ( K X ) + F ( K Y );5. Lipschitz: | F ( K X ) − F ( K Y ) | ≤ k K X − K Y k .The proof of Lipschitz continuity is essentially the same as Lemma 3, except that weinstead have F ( K Y ) − F ( K X ) ≤ F ( K X + ε ) − F ( K X ) ≤ F ( K ε ) = Φ( ε ) = ε. The second inequality here uses sub-additivity.This functional F is initially defined on L = { K X : X ∈ L ∞ } . We now extend it to allof C ( ¯ R ): Lemma 18.
Any functional F on L satisfying the above five properties can be extended toa functional on C ( ¯ R ) maintaining these properties, with homogeneity strengthened to allowfor scalar multiplication with any positive real number (instead of n ).Proof. As in the proof of Lemma 4, we can extend F by homogeneity to the rational conespanned by L , and then extend by continuity to the entire cone. We thus have a functional H defined on Cone( L ) that satisfies monotonicity, homogeneity (over R + ), sub-additivityand Lipschitz continuity.To further extend H to all continuous functions, we define for each g ∈ C ( ¯ R ) I ( g ) = inf f ≥ g, f ∈ Cone( L ) H ( f ) . (19)Note first that I ( g ) is well-defined and finite. This is because each ∈ C ( ¯ R ) is bounded, sothe constant function f = max[ g ] ∈ Cone( L ) is point-wise greater than g . Moreover, any function f ∈ Cone( L ) that is point-wise greater than g must be point-wise greater thanthe constant function min[ g ]. So by monotonicity, H ( f ) ≥ min[ g ] for any such f .Secondly, when g ∈ Cone( L ) we have I ( g ) = H ( g ) by monotonicity of H . So I extends H . It is also easy to see I ( g ) maintains monotonicity and homogeneity.Thirdly, we check I is sub-additive. Fix any g , g and choose any ε >
0. Then bydefinition of the infimum, there exists f , f ∈ Cone( L ) such that f i ≥ g i and H ( f i )
2. Thus the function f + f ∈ Cone( L ) and is bigger than g + g . Thisimplies I ( g + g ) ≤ H ( f + f ) ≤ H ( f ) + H ( f ) < I ( g ) + I ( g ) + 2 ε, where the second inequality uses the sub-additivity of H . Since ε is arbitrary, I is indeedsub-additive.Finally, we check I is Lipschitz. Suppose g ≤ g + ε for some ε >
0, then for any f ∈ Cone( L ) that is greater than g , we have f + ε ∈ Cone( L ) being greater than g . Soby sub-additivity of H and H ( ε ) = ε , I ( g ) ≤ H ( f + ε ) ≤ H ( f ) + ε. H ( f ) approach I ( g ) thus yields the desired result I ( g ) ≤ I ( g ) + ε .Hence this functional I is the desired extension of F to all of C ( ¯ R ).Given this extension I satisfying I ( K X ) = Φ( X ), the “only if” direction of Theorem 6will follow from the next result characterizing such functionals I : Lemma 19.
Let I : C ( ¯ R ) → R be a functional that is monotone, homogeneous, sub-additive and Lipschitz, and maps any constant function to this constant. Then there existsa non-empty closed convex set C of Borel probability measures on ¯ R , such that for every g ∈ C ( ¯ R ) I ( g ) = max µ ∈ C Z ¯ R g ( a ) d µ ( a ) . Proof.
Homogeneity and sub-additivity implies I is convex , in the sense that I ( λg + (1 − λ ) g ) ≤ λI ( g ) + (1 − λ ) I ( g ) for all g , g ∈ C ( ¯ R ) and λ ∈ (0 , I is a convex andcontinuous functional on the normed function space C ( ¯ R ). By Theorem 7.6 in Aliprantisand Border (2006), the functional I coincides with its convex envelope , meaning that I ( g ) = sup { J ( g ) : J ≤ I and J is an affine and continuous functional } . (20)Using the Riesz-Markov-Kakutani Representation Theorem, any such functional J can bewritten as J ( g ) = b + Z ¯ R g ( a ) d µ ( a )for some b ∈ R and some possibly signed finite measure µ .Now observe from (20) that J (0) ≤ I (0) = 0, so b ≤
0. Moreover, since I is homogeneous,we deduce from J ( ng ) ≤ I ( ng ) = nI ( g ) that bn + R ¯ R g ( a ) d µ ( a ) ≤ I ( g ) for every positiveinteger n , and thus ˆ J ( g ) = R ¯ R g ( a ) d µ ( a ) lies between J ( g ) and I ( g ). It follows that we canreplace each affine J by the linear functional ˆ J without affecting (20). So we can rewrite I ( g ) = sup µ ∈ C Z ¯ R g ( a ) d µ ( a ) (21)for some set C of possibly signed measures µ .Choose g ≤
0. Then by monotonicity of I we have I ( g ) ≤
0. Thus (21) implies that R ¯ R g ( a ) d µ ( a ) ≤ I ( g ) ≤
0. Since this holds for any continuous function g ≤
0, we concludethat each µ ∈ C is a non-negative measure. Moreover, plugging g = 1 into (21) yields | µ | ≤
1, whereas plugging g = − | µ | ≥
1. Thus C is a nonempty set of probabilitymeasures.Finally, note that taking the closed convex hull of C does not affect the equality in(21). So we can assume C is closed and convex. In this case the supremum is achieved asmaximum, because any sequence of probability measures on the compact metric space ¯ R has a weakly convergent sub-sequence, by Prokhorov’s Theorem. This proves Lemma 19and thus Theorem 6. 55 .2 Proof of Proposition 7 The proof is essentially the same as Theorem 6, so we only point out the differences.Lemma 17 holds without change, and we can still view Φ( X ) as a functional F ( K X ).However, in the current setting F is only defined on L + = { K X : X ∈ L ∞ + } , which onlycontains non-negative functions.Using the same construction as in Lemma 18, we can extend F to a functional I on C ( ¯ R ). But note that in applying Lemma 19, we need to weaken the assumption that I maps any constant function to this constant. In the current setting, this only holds fornon-negative constants. Therefore, when following the proof of Lemma 19, we can nolonger deduce | µ | ≥ g = −
1. The consequence is that theconclusion of Lemma 19 is correspondingly weakened to I ( g ) = max µ ∈ C Z ¯ R g ( a ) d µ ( a )for a non-empty closed convex set C of sub-probability measures satisfying max µ ∈ C | µ | = 1.Note that supremum is still achieved, since Prokhorov’s Theorem also applies to sub-probability measures. This gives the result in Proposition 7. E.3 Proof of Proposition 8
The necessity of upper-semicontinuity (usc) for the “inf-integral” representation has beendiscussed, so we again focus on the “only if” direction. We first derive the followinganalogue of Lemma 17:
Lemma 20.
Let Φ be an monotone, homogeneous and usc statistic defined on L ∞ + . If K a ( X ) ≥ K a ( Y ) for all a ∈ ¯ R then Φ( X ) ≥ Φ( Y ) .Proof. Recall that we showed before Lemma 17 that a homogeneous and sub-additivestatistic satisfies the stronger form of homogeneity: Φ( X ∗ n ) = n Φ( X ). An analogousargument applies to a homogeneous and super-additive statistic. Thus, following the proofof Lemma 17, we have ( X + 2 ε ) ∗ n ≥ Y ∗ n for every ε > n sufficiently large. Thus Φ(( X + 2 ε ) ∗ n ) ≥ Φ( Y ∗ n ) by monotonicityand Φ( X + 2 ε ) ≥ Φ( Y ) by homogeneity.Now since Φ is usc, lim ε → + Φ( X + 2 ε ) = Φ( X ). Hence Φ( X ) ≥ Φ( Y ) as desired.Given this lemma, we can now view Φ( X ) as a functional F ( K X ) that has the followingfive properties (slightly different from the sub-additive case studied before):1. constants: F ( c ) = c for every non-negative constant function c ≥ K X ≥ K Y implies F ( K X ) ≥ F ( K Y );56. homogeneity: F ( nK X ) = nF ( K X ) , ∀ n ∈ N + ;4. super-additivity: F ( K X + K Y ) ≥ F ( K X ) + F ( K Y );5. upper-semicontinuity: lim ε → + F ( K X + ε ) = F ( K X ).This functional F is defined on L + = { K X : X ∈ L ∞ + } , but we will extend it to C + ( ¯ R ),the space of all non-negative continuous functions on ¯ R . Lemma 21.
Any functional F on L + satisfying the above five properties can be extendedto a functional on C + ( ¯ R ) maintaining these properties, with homogeneity strengthened tobe over R + .Proof. The proof of this lemma is somewhat different from the proof of Lemma 18 before,due to the fact that we have now have usc instead of Lipschitz continuity. Thus, in thecurrent setting we first extend F to a functional G defined on the rational cone Cone Q ( L + )that maintains the five properties and satisfies homogeneity over Q + (usc of G followsfrom that of F and the definition G ( mn K X ) = mn F ( K X )). But in the next step, instead ofextending to the whole cone by Lipschitz continuity, we directly extend G to all of C + ( ¯ R )by the following construction: I ( g ) = inf ε> sup f ≤ g + ε, f ∈ Cone Q ( L + ) G ( f ) ! . (22)For any g ≥ ε >
0, the constant function f = 0 is in the rational cone andsatisfies f ≤ g + ε . So the inner supremum in (22) is non-negative. Moreover, any f ≤ g + ε is smaller than the constant function max[ g ] + ε , so by monotonicity of G we know thatthe inner supremum is at most max[ g ] + ε . This implies I ( g ) ∈ [0 , max[ g ]] is well-defined.We also note that if g is in the rational cone, then the inner supremum is achieved by thefunction f = g + ε by monotonicity of G . Thus in this case I ( g ) = inf ε> G ( g + ε ) = G ( g ),where the latter equality holds by usc of G . Thus the functional I extends G , and inparticular I satisfies the first property above that I ( c ) = c for every c ≥ I is monotone. This is clear because if g ≥ g , then for any ε > g than for g . So I ( g ) ≥ I ( g ).The third property to check is homogeneity. We first show I is homogeneous over Q + ,i.e., I ( mn g ) = mn I ( g ) whenever m, n are positive integers. Indeed, by writing ε = mn ˆ ε and57 f = mn f we have I ( mn g ) = inf ε> sup f ≤ mn g + ε, f ∈ Cone Q ( L + ) G ( f )= inf ˆ ε> sup f ≤ mn ( g +ˆ ε ) , f ∈ Cone Q ( L + ) G ( f )= inf ˆ ε> sup ˆ f ≤ g +ˆ ε, ˆ f ∈ Cone Q ( L + ) G ( mn ˆ f )= mn inf ˆ ε> sup ˆ f ≤ g +ˆ ε, ˆ f ∈ Cone Q ( L + ) G ( ˆ f )= mn I ( g ) . From the second line to the third line above, we used the observation that f is in therational cone if and only if ˆ f = mn f is in the rational cone. Now since I is homogeneous over Q + and also monotone, an approximation argument shows that I is in fact homogeneousover R + (note that we are dealing with non-negative functions here).Next, we check I is super-additive. This follows from the observation that for each ε >
0, sup f ≤ g + ε, f ∈ Cone Q ( L + ) G ( f ) + sup f ≤ g + ε, f ∈ Cone Q ( L + ) G ( f ) ≤ sup f ≤ g + g +2 ε, f ∈ Cone Q ( L + ) G ( f ) . The above inequality holds because for any f , f showing up on the left-hand side, thefunction f = f + f is in the rational cone and satisfies f ≤ g + g + 2 ε . So the right-handside is at least G ( f + f ) ≥ G ( f ) + G ( f ).Finally, we check I is also usc. Choose any g ≥ b = I ( g ). Then we needto show that for any γ >
0, there exists δ > I ( g + δ ) ≤ b + γ . To see this, notefrom the definition (22) that there exists some ¯ ε > f ≤ g +¯ ε, f ∈ Cone Q ( L + ) G ( f ) ≤ b + γ. Thus, for any δ < ¯ ε , we have I ( g + δ ) = inf ε> sup f ≤ g + δ + ε, f ∈ Cone Q ( L + ) G ( f ) ! ≤ sup f ≤ g + δ + ε, f ∈ Cone Q ( L + ) G ( f ) ! (cid:12)(cid:12)(cid:12)(cid:12) ε =¯ ε − δ = sup f ≤ g +¯ ε, f ∈ Cone Q ( L + ) G ( f ) ≤ b + γ. This completes the proof. 58roposition 8 now follows from the following analogue of Lemma 19:
Lemma 22.
Let I : C + ( ¯ R ) → R be a functional that is monotone, homogeneous, super-additive and upper-semicontinuous, and maps any non-negative constant function to thisconstant. Then there exists a non-empty closed convex set C of finite Borel super-probabilitymeasures on ¯ R , such that for every g ∈ C + ( ¯ R ) I ( g ) = inf µ ∈ C Z ¯ R g ( a ) d µ ( a ) . Proof.
Note first that homogeneity and super-additivity imply I is a concave functional onnon-negative continuous functions. Next, we extend I to all continuous functions by letting I ( g ) = −∞ whenever g ( a ) < a ∈ ¯ R . This is in fact consistent with (22), sincefor sufficiently small ε there exists no functions f ∈ Cone Q ( L + ) that satisfies f ≥ g + ε .Although the resulting functional (let us still call it I ) sometimes take the value −∞ , itis a proper extended concave function according to Definition 7.1 in Aliprantis and Border(2006). Specifically, I is “proper” because it never assumes the value ∞ and does notalways equal −∞ . It is “concave” because its hypograph hypo I = { ( g, α ) ∈ C ( ¯ R ) × R : α ≤ I ( g ) } is a convex set. This is because the requirement α ≤ I ( g ) forces g ≥
0, so I ( g ) ≥ α and I ( g ) ≥ α imply I ( λg + (1 − λ ) g ) ≥ λα + (1 − λ ) α by the concavity of I fornon-negative functions.We next show this hypograph is a closed set, so that I is an upper-semicontinuousproper concave functional according to Section 7.2 in Aliprantis and Border (2006). Indeed,choose any sequence { g n } ⊂ C ( ¯ R ) and { α n } ⊂ R satisfying I ( g n ) ≥ α n for each n , andsuppose g n → g in the sup norm and α n → α . Then we first have g n ≥ g ≥
0. Moreover, for each ε > g n ≤ g + ε for sufficiently large n . Thus bymonotonicity of I , I ( g + ε ) ≥ I ( g n ) ≥ α n for every large n . Taking n to infinity yields I ( g + ε ) ≥ α . But since ε is arbitrary, wehave I ( g ) = lim ε → + I ( g + ε ) ≥ α as well. So ( g, α ) also belongs to the hypograph, whichis thus closed.Now that we know I is an usc proper concave functional, we can apply the directanalogue of Theorem 7.6 in Aliprantis and Border (2006) to deduce that I coincides withits concave envelope : I ( g ) = inf { J ( g ) : J ≥ I and J is an affine and continuous functional } . As in the proof of Lemma 19, any such functional J can be written as J ( g ) = b + Z ¯ R g ( a ) d µ ( a )59or some b ∈ R and some possibly signed finite measure µ . In fact, b ≥ I (0) ≤ J (0).And since I is homogeneous, we can replace b by bn . So in the end we can without lossassume b = 0.Since for any continuous g ≥ J ( g ) = R ¯ R g ( a ) d µ ( a ) ≥ I ( g ) ≥
0, any such µ is a non-negative measure. We also know from J (1) ≥ I (1) = 1 that | µ | ≥
1. This leadsto the desired representation I ( g ) = inf µ ∈ C Z ¯ R g ( a ) d µ ( a )for a set C of finite super-probability measures µ . As before, assuming C to be closed andconvex is without loss, although in this case the infimum need not be achieved. E.4 Strengthening Proposition 8
A feature in Proposition 8 is that the infimum is not necessarily achieved. To get a sharperresult, we define Φ to be
Lipschitz usc if there exists a constant ‘ > X + 1) − Φ( X ) ≤ ‘ holds for every X in the domain. Note that when Φ is homogeneous, this condition isequivalent to the stronger condition thatΦ( X + ε ) − Φ( X ) ≤ ‘ε for every X and very ε > We will use these conditions interchangeably.
Proposition 9.
Φ : L ∞ + → R is monotone, homogeneous, super-additive and Lipschitzupper-semicontinuous if and only if there exists a nonempty closed convex set C of uniformlybounded Borel super-probability measures on ¯ R satisfying min µ ∈ C | µ | = 1 , such that forevery X ∈ L ∞ + it holds that Φ( X ) = min µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . Proof.
For the “if” direction, we simply note that if all measures µ ∈ C have total mass nogreater than ‘ , then Z ¯ R K a ( X + ε ) d µ ( a ) ≤ ‘ε + Z ¯ R K a ( X ) d µ ( a )for any µ ∈ C and any ε >
0. From this it follows that Φ( X + ε ) − Φ( X ) ≤ ‘ε , and so thestatistic Φ must be Lipschitz usc. To see this, note that Φ( X ∗ n + 1) − Φ( X ∗ n ) ≤ ‘ implies Φ( X + n ) − Φ( X ) ≤ ‘n by homogeneity. ThusΦ( X + ε ) − Φ( X ) ≤ ‘ε holds when ε is a positive rational number. By monotonicity of Φ, it also holds forany positive real number ε . X ) = inf µ ∈ C Z ¯ R K a ( X ) d µ ( a ) . We now show that the Lipschitz property of Φ further implies it is without loss to assumethe measures in C are uniformly bounded. To do this, let n be any positive integer andconsider the following subset of C : C n = { µ ∈ C : | µ | ≤ n } . Define an alternative statisticˆ S ( X ) = inf µ ∈ C n Z ¯ R K a ( X ) d µ ( a ) ≥ Φ( X ) . If Φ( X ) = ˆ S ( X ) for every X then we are done. Otherwise there exists X ∈ L ∞ + and δ > S ( X ) > Φ( X ) + δ .Now take any positive ε < δn , we will show that Φ( X + ε ) ≥ Φ( X ) + nε . Indeed, forany measure µ ∈ C n , we have Z ¯ R K a ( X + ε ) d µ ( a ) ≥ Z ¯ R K a ( X ) d µ ( a ) ≥ ˆ S ( X ) > Φ( X ) + δ > Φ( X ) + nε. On the other hand, if µ ∈ C \ C n , then it also holds that Z ¯ R K a ( X + ε ) d µ ( a ) = Z ¯ R ( K a ( X ) + ε ) d µ ( a ) = ε | µ | + Z ¯ R K a ( X ) d µ ( a ) > Φ( X ) + nε. Hence we do have Φ( X + ε ) ≥ Φ( X ) + nε for ε sufficiently small.But Φ is assumed to be Lipschitz usc, so the previous conclusion cannot hold for every n (and some X ). It follows that for some n , we must have Φ( X ) = ˆ S ( X ). ThereforeΦ( X ) = inf µ ∈ C Z ¯ R K a ( X ) d µ ( a )for a set C of super-probability measures that are uniformly bounded. Finally, we can take C to be closed and convex, and then the infimum is achieved by Prokhorov’s Theorem.The following is an example of a super-additive statistic that is usc but not Lipschitzusc. For each s ≥
1, let µ s = s · δ − s be the measure that puts mass s on a = − s . ConsiderΦ( X ) = inf s ≥ Z ¯ R K a ( X ) d µ s ( a ) = inf s ≥ − s log E h e − s X i . If X equals 0 or 1 with equal probabilities, then E h e − s X i = + e − s ∈ ( , ). Theabove infimum thus evaluates to 0, which is the limit as s → ∞ (not achieved at any finite s ). But for any ε >
0, we haveΦ( X + ε ) = inf s ≥ Z ¯ R K a ( X + ε ) d µ s ( a ) = inf s ≥ − s log E h e − s X i + εs. E h e − s X i < , it holds that for every s ≥ − s log E h e − s X i + εs > log s + εs ≥ r log 43 · ε. Thus Φ( X ) = 0 while Φ( X + ε ) is at least on the order of √ εε