[PDF] Loss aversion and the welfare ranking of policy interventions

Abstract

This paper develops theoretical criteria and econometric methods to rank policy interventions in terms of welfare when individuals are loss-averse. Our new criterion for "loss aversion-sensitive dominance" defines a weak partial ordering of the distributions of policy-induced gains and losses. It applies to the class of welfare functions which model individual preferences with non-decreasing and loss-averse attitudes towards changes in outcomes. We also develop new statistical methods to test loss aversion-sensitive dominance in practice, using nonparametric plug-in estimates. We establish the limiting distributions of uniform test statistics by showing that they are directionally differentiable. This implies that inference can be conducted by a special resampling procedure. Since point-identification of the distribution of policy-induced gains and losses may require very strong assumptions, we extend our comparison criteria, test statistics, and resampling procedures to the partially-identified case. Finally, we illustrate our methods with a simple empirical application to the welfare comparison of alternative income support programs in the US.

Full PDF

LLoss aversion and the welfare ranking of policy interventions ∗ Sergio Firpo † Antonio F. Galvao ‡ Martyna Kobus § Thomas Parker ¶ Pedro Rosa-Dias (cid:107)

April 21, 2020

Abstract

In this paper we develop theoretical criteria and econometric methods to rank policyinterventions in terms of welfare when individuals are loss-averse. The new criterion for“loss aversion-sensitive dominance” deﬁnes a weak partial ordering of the distributions ofpolicy-induced gains and losses. It applies to the class of welfare functions which modelindividual preferences with non-decreasing and loss-averse attitudes towards changes inoutcomes. We also develop new statistical methods to test loss aversion-sensitive domi-nance in practice, using nonparametric plug-in estimates. We establish the limiting dis-tributions of uniform test statistics by showing that they are directionally diﬀerentiable.This implies that inference can be conducted by a special resampling procedure. Sincepoint-identiﬁcation of the distribution of policy-induced gains and losses may require verystrong assumptions, we also extend comparison criteria, test statistics, and resamplingprocedures to the partially-identiﬁed case. Finally, we illustrate our methods with anempirical application to welfare comparison of two income support programs.

Keywords:

Welfare, Loss Aversion, Policy Evaluation, Stochastic Ordering, DirectionalDiﬀerentiability

JEL codes:

C12, C14, I30 ∗ The authors are grateful to Pedro Carneiro, Hide Ichimura, Radosław Kurek, Essie Maasoumi, Piotr Miłoś,Magne Mogstad, Jim Powell, João Santos Silva, Tiemen Woutersen and seminar participants at the Universityof California Berkeley, University of Arizona, 28th annual meeting of the Midwest Econometrics Group, 35thMeeting of the Canadian Econometric Study Group, and 3rd edition of the Rio-Sao Paulo EconometricsWorkshop for useful comments and discussions regarding this paper. Marta Schoch provided excellent researchassistance. Computer programs to replicate the numerical analyses are available from the authors. All theremaining errors are ours. † Insper, Sao Paulo, Brazil. E-mail: [email protected] ‡ Department of Economics, University of Arizona, Tucson, USA. E-mail: [email protected] § Institute of Economics, Polish Academy of Sciences, Warsaw, Poland. E-mail: [email protected] ¶ Department of Economics, University of Waterloo, Waterloo, Canada. E-mail: [email protected] (cid:107)

Department of Economics and Public Policy, Imperial College Business School, Imperial College London,UK. E-mail: [email protected] a r X i v : . [ ec on . E M ] A p r e suﬀer more, ... when we fall from a better to a worse situation, than we ever enjoy when we risefrom a worse to a better. Adam Smith, The Theory of Moral Sentiments

The welfare ranking of policy interventions has classically (Atkinson, 1970) been conductedunder the Rawlsian principle of the “veil of ignorance”: all policies that produce the samemarginal outcome distributions are deemed equivalent for the purpose of welfare analysis.From this perspective, individual gains and losses should be irrelevant (Roemer, 1998; Sen,2000). However, policies often generate heterogeneous eﬀects, potentially giving rise to gainsand losses, which can be consequential for several reasons.More modern approaches to ranking policy interventions greatly emphasize how diﬀerentindividuals are aﬀected by a policy (Heckman and Smith, 1998; Carneiro, Hansen, and Heck-man, 2001). A powerful motivation for this lies in the dynamics of political economy. Publicsupport for a policy, and for the authorities that implement it, depend on the balance of gainsand losses experienced by diﬀerent individuals in the electorate. In addition, and in line withthe political economy arguments adduced in Carneiro, Hansen, and Heckman (2001), thereis mounting empirical evidence corroborating that the electorate exhibits loss aversion — anempirical regularity that has been identiﬁed in a wide variety of other contexts (Kahnemanand Tversky, 1979; Samuelson and Zeckhauser, 1988; Tversky and Kahneman, 1991; Rabinand Thaler, 2001; Rick, 2011). This aversion to losses among constituents, in turn, drivesthe actions of policy makers, as documented in contexts as diverse as government supportto the steel industry in US trade policy, and the repeal of the Aﬀordable Care Act (Freundand Özden, 2008; Alesina and Passarelli, 2019). In this paper we develop new testable crite-ria and econometric methods to rank distributions of individual policy eﬀects from a welfarestandpoint when individuals exhibit loss aversion. This extends the toolkit available for theevaluating the impact of policy interventions.Our ﬁrst contribution is to propose criteria for ranking policies when agents are averse tolosses by using the standard welfare function approach (Atkinson, 1970), namely, that policiesmay be evaluated based on a welfare ranking. We use a ranking based on social value func-tions, which aggregate individual gains and losses evaluated by a cardinal and interpersonallycomparable value function, similarly to standard utilitarian welfare ranking. As is well-known,the latter is equivalent to ﬁrst-order stochastic dominance (FOSD) over distributions of policyoutcomes. In a similar spirit, as the ﬁrst main contribution, we show that the social value1unction ranking with non-decreasing and loss-averse value functions (Tversky and Kahneman,1991) is equivalent to a new concept we call loss aversion-sensitive dominance (LASD) overdistributions of policy-induced gains and losses. Recall that FOSD requires that the cumula-tive distribution function of the dominated distribution lies everywhere above the cumulativedistribution of the dominant distribution. In contrast, under LASD it must lie suﬃcientlyabove the dominant distribution such that potential losses cannot be compensated by poten-tial gains. This is a consequence of loss-aversion. Except for the special case of a status quo policy (i.e. a policy of no change) where FOSD and LASD coincide, generally, as we show,LASD can be used to compare policies that are indistinguishable for FOSD. The LASD criterion relies on gains and losses, which under certain identiﬁcation conditionscould be considered treatment eﬀects . It is well known that the point identiﬁcation of thedistribution of treatment eﬀects may require implausible theoretical restrictions such as rankinvariance of potential outcomes (Heckman, Smith, and Clements, 1997). We extend our LASDcriteria to a partially-identiﬁed setting and establish a suﬃcient condition to rank alternativepolicies under partial identiﬁcation of the distributions of their eﬀects. We use Makarovbounds (Makarov, 1982; Rüschendorf, 1982; Frank, Nelsen, and Schweizer, 1987) to bound thedistribution of treatment eﬀects when the joint pre- and post-policy outcome distribution isunknown. This provides a testable criterion that can be used in practice, since the marginaldistribution functions from samples observed under various treatments can usually be identiﬁedand Makarov bounds only rely on marginal information for their identiﬁcation.The second contribution of this paper is to develop statistical inference procedures topractically test the loss averse-sensitive dominance condition using sample data. We developstatistical tests for both point-identiﬁed and partially-identiﬁed distributions of outcomes. Thetest procedures are designed to assess, uniformly over the two outcome distributions, whetherone treatment dominates another in terms of the LASD criterion. Speciﬁcally, we suggestKolmogorov-Smirnov and Cramér-von Mises test statistics that are applied to nonparametricplug-in estimates of the LASD criterion mentioned above. Inference for these statistics usesspecially tailored resampling procedures. We show that our procedures control the size oftests uniformly over probability distributions that satisfy the null hypothesis. Our tests arerelated to the literature on uniform inference for stochastic dominance represented by, e.g.,Linton, Song, and Whang (2010); Linton, Maasoumi, and Whang (2005); Barrett and Donald The literature on stochastic dominance is vast and spans economics and mathematics - we refer the readerto, e.g., Shaked and Shanthikumar (1994) and Levy (2016) for a review. When dominance curves cross, higherorder or inverse stochastic dominance criteria have been proposed. The former involves conditions on higher(typically third and fourth) order derivatives of utility function (e.g. Fishburn (1980), Chew (1983) to whichEeckhoudt and Schlesinger (2006) provided interesting interpretation, whereas the latter is related to therank-dependent theory originally proposed by Weymark (1981) and Yaari (1987, 1988), where social welfarefunctions are weighted averages of ordered outcomes with weights decreasing with the rank of the outcome (seeAaberge, Havnes, and Mogstad (2018) for a recent reﬁnement of this theory). L -norm statistics ap-plied to this function are just regular enough that, with some care, resampling can be used toconduct inference. We rely on recent results from Fang and Santos (2019), who built on thework of Dümbgen (1993), to propose an inference procedure that combines standard resam-pling with an estimate of the way that test statistics depend on underlying data distributions.We contribute to the literature on directionally diﬀerentiable test statistics with a new testfor LASD. Recent contributions to this literature include, among others, Cattaneo, Jansson,and Nagasawa (2017); Hong and Li (2018); Chetverikov, Santos, and Shaikh (2018); Cho andWhite (2018); Christensen and Connault (2019); Fang and Santos (2019) and Masten andPoirier (2020). When distributions are only partially identiﬁed by bounds, the situation ismore challenging, but the problem has a similar solution. This allows us to conduct conser-vative inference in the partially identiﬁed case. Our contribution to this literature is novelbecause of our focus on uniform tests for dominance in both the point- and partially-identiﬁedcases.Finally, this paper also relates to the strand of literature that develops methods to estimatethe optimal treatment assignment policy that maximizes a social welfare function. Recentdevelopments can be found in Manski (2004), Dehejia (2005), Hirano and Porter (2009), Stoye(2009), Bhattacharya and Dupas (2012), Tetenov (2012), Kitagawa and Tetenov (2018, 2019),among others. These papers focus on the decision-theoretic properties and procedures thatmap empirical data into treatment choices. In this context, our paper is most closely related toKasy (2016), which focuses on welfare rankings of policies rather than optimal policy choice.We empirically illustrate the use of our proposed criteria and tests with a welfare compari-son of two well-known income support programs using data from Bitler, Gelbach, and Hoynes32006). We show that, in the case of a policy with gainers and losers, the use of our lossaversion-sensitive evaluation criteria may lead to a ranking of policy interventions that diﬀersfrom that obtained when their outcomes are compared using stochastic dominance.The rest of the paper is organized as follows. Section 2 presents some basic deﬁnitions andnotation and deﬁnes loss aversion-sensitive dominance. Section 3 develops testable criteria forloss aversion-sensitive dominance. Section 4 proposes statistical inference methods for LASDusing sample observations. An empirical application appears in Section 5. Finally, Section6 concludes. One appendix includes auxiliary results and deﬁnitions, and a second appendixcollects proof of the results in the text. In this section, we propose a novel dominance relation for ordering policies under the assump-tion that social decision makers consider the distribution of individual gains and losses underdiﬀerent policy scenarios. We call this criterion Loss Aversion-Sensitive Dominance (LASD).Suppose a random variable X describes individual gains and losses, and X has cumulativedistribution function F , and let F be the set of cumulative distribution functions with boundedsupport X . We maintain the assumption throughout that F ∈ F . The bounded supportassumption is made to avoid technical conditions on tails of distribution functions.A decision maker has preferences over X that are represented via a continuous social valuefunction (SVF). Deﬁnition 2.1 (Social Value Function) . Suppose random variable X has CDF F ∈ F andlet W : F → R denote the following social value function W ( F ) = ˆ X v ( x ) d F ( x ) , (1)where v : X → R is called a value function . The social value function deﬁned above is standard in the literature. W ( F ) is the expectedevaluation of the distribution of X by a decision maker who uses value function v ( · ) . The valuefunction v ( · ) in (1) need not be any agent’s actual value function, but simply the utility functionthat the social planner uses to convert outcomes into an interpersonally-comparable measureof well-being (Gajdos and Weymark, 2012). We depart from the standard assumptions on Formally speaking we have W v ( F ) but we suppress the subscript v for expositional brevity. v exhibits the following features: (i) agentsassign negative value to losses and positive value to gains, (ii) the value function is monotone(increasing), and our key property – (iii) there is asymmetry in gains and losses, namely, losseshurt an agent more than gains of equivalent magnitude make her happy. These properties areformally listed in the next deﬁnition. Deﬁnition 2.2 (Properties of the value function) . The value function v : X → R satisﬁes:1. Disutility of losses and utility of gains: v ( x ) ≤ for all x < , v (0) = 0 and v ( x ) ≥ forall x > .2. Non-decreasing: v (cid:48) ( x ) ≥ for all x .3. Loss-averse: v (cid:48) ( − x ) ≥ v (cid:48) ( x ) for all x > .The properties in Deﬁnition 2.2 are typically assumed in Prospect Theory together withthe additional requirement of S-shapedness of value function, which we do not consider (see,e.g., p. 279 of Kahneman and Tversky (1979)). Assumptions 1 and 2 are standard concavityand monotone increasing conditions. Assumption 3 expresses the idea that “losses loom largerthan corresponding gains” and is a widely accepted deﬁnition of loss aversion (Tversky andKahneman, 1992, p.303). It is a stronger condition than the one considered by Kahneman andTversky (1979).The following form of W ( F ) will be useful in subsequent deﬁnitions and results. Proposition 2.3.

Suppose that F ∈ F and v is once diﬀerentiable. Then W ( F ) = − ˆ −∞ v (cid:48) ( x ) F ( x ) d x + ˆ ∞ v (cid:48) ( x )(1 − F ( x )) d x. (2)Assume that the decision maker’s social value function W depends on v which satisﬁesDeﬁnition 2.2, and she wishes to compare random variables X A and X B which represent gainsand losses under two policies labeled A and B . We use the labels F A and F B for the distributionfunctions of X A and X B . The decision maker prefers X A over X B if she evaluates F A as betterthan F B using her SVF — speciﬁcally, X A is preferred to X B if and only if W ( F A ) ≥ W ( F B ) ,where W is deﬁned in Deﬁnition 2.1. This idea is formalized below. Deﬁnition 2.4 (Loss Aversion-Sensitive Dominance) . Let X A and X B have distribution func-tions respectively labeled F A , F B ∈ F . If W ( F A ) ≥ W ( F B ) for all value functions v that satisfyDeﬁnition 2.2, we say that F A dominates F B in terms of Loss Aversion-Sensitive Dominance ,or LASD for short, and we write F A (cid:23) LASD F B .5n the next section we relate this abstract notion to a more concrete condition that dependson the cumulative distribution functions of the outcome distributions, F A and F B . In this section we formulate conditions for evaluating distributions of gains and losses. Wepropose criteria that indicate whether one distribution of gains and losses dominates anotherin the sense described in Deﬁnition 2.4.For making comparisons between policies A and B , an econometrician can generally observethree relevant distributions. First, suppose that the control or current distribution of agents’outcomes is represented by the random variable Z which has marginal distribution function G . Two other random variables, Z A and Z B , describe outcomes under policies A and B .Assume their marginal distribution functions are G A and G B respectively. However, a decisionmaker who is sensitive to loss considers diﬀerences induced by these prospective policies. Thegains and losses due to policies A and B are deﬁned by the random variables X A = Z A − Z and X B = Z B − Z . The decision maker’s goal is to compare policies A and B using thedistribution functions F A and F B , the distribution functions of X A and X B .The problem with comparing the variables X A and X B is well-known in the treatmenteﬀects literature: F A and F B depend on the joint distribution of ( Z , Z A , Z B ) , which maynot be observable without restrictions imposed by an economic model. In subsection 3.1 weabstract from speciﬁc identiﬁcation conditions and discusses LASD under the assumption that F A and F B are identiﬁed. In subsection 3.2 we work with a partially identiﬁed case whereonly the marginal distribution functions G , G A and G B are observable and no restrictionsare made to identify F A and F B . The LASD concept in Deﬁnition 2.4 requires that one distribution is preferred to anotherover an entire class of social value functions and is diﬃcult to test directly. The followingresult relates the LASD concept to a criterion which depends only on marginal distributionfunctions and orders F A and F B according to the class of SVFs allowed in Deﬁnition 2.2. Inthis section we assume that F A , F B ∈ F are point identiﬁed. This may result from a varietyof econometric restrictions that deliver identiﬁcation and are the subject a large literature. Theorem 3.1.

Suppose that F A , F B ∈ F . The following are equivalent: . F A (cid:23) LASD F B .2. For all x ≥ , F A and F B satisfy F B ( − x ) − F A ( − x ) ≥ max { , F A ( x ) − F B ( x ) } . (3)

3. For all x ≥ , F A and F B simultaneously satisfy F A ( − x ) − F B ( − x ) ≤ (4) and (1 − F A ( x )) − F A ( − x ) ≥ (1 − F B ( x )) − F B ( − x ) . (5)Theorem 3.1 provides two diﬀerent conditions that can be used to verify whether onedistribution of gains and losses dominates the other in the LASD sense. These criteriacompare the outcome distributions by examining how the distribution functions ( F A , F B ) assign probabilities to gains and losses of all possible magnitudes. The particular way thatthey make a comparison is related to the relative importance of gains and losses. Considercondition (3). For X B to be dominated, its distribution function must lie above the distributionof X A for losses . X B can be dominated by X A even when gains under A are dominated bythose under B — that is, when F A ( x ) − F B ( x ) ≥ for some x ≥ — as long as this lackof dominance in gains is compensated by suﬃcient dominance of X A over X B in the lossesregion. This is a consequence of the asymmetric treatment of gains and losses. On the otherhand, consider conditions (4) and (5). Condition (4) is a FOSD condition applied to losses.This is a consequence of loss aversion; note that in the extreme case where only losses matter,we would have (4). Condition (5) is a tail condition on the distributions. It requires that whenbalancing the probabilities of gains and losses of absolute magnitudes at least as large as x , X A provides gains to a higher proportion of agents than does X B . Inequality (3) combinesthe two inequalities represented by (4) and (5) into a single equation.It is interesting to note that LASD has one property in common with FOSD, namely, ahigher mean is a necessary condition for both types of dominance. Corollary 3.2. If F A (cid:23) LASD F B then E [ X A ] ≥ E [ X B ] . Note that FOSD cannot rank two distributions that have the same mean — that is, if F A (cid:23) F OSD F B and E [ X A ] = E [ X B ] , then F A = F B . This is not the case for LASD in (3), LASD is a partial order. Over losses, (4) is a partial order because FOSD is a partial order. For thetail condition (5) checking transitivity we have (1 − F A ( x )) − F A ( − x ) ≥ (1 − F B ( x )) − F B ( − x ) , (1 − F B ( x )) − F B ( − x ) ≥ (1 − F C ( x )) − F C ( − x ) , and (1 − F A ( x )) − F A ( − x ) ≥ (1 − F C ( x )) − F C ( − x ) . If F A ( − x ) − F B ( − x ) = 0 then F A ( − x ) = F B ( − x ) and using it in (5) gives anti-symmetry.

7s the next example demonstrates. Therefore, for example, when comparing two distributionswith the same average eﬀect, equation (3) may still be used to diﬀerentiate between them.

Example 3.3.

Consider the family of uniform distributions on [ − − y, − y ] ∪ [ y, y + 1] indexedby y > and denote the corresponding member distribution functions F y . The family of suchdistributions have mean zero and F y (cid:23) LASD F y (cid:48) whenever y < y (cid:48) . Indeed, note that W ( F y ) = 12 (cid:18) ˆ − y − − y v ( z ) d z + ˆ yy v ( z ) d z (cid:19) and thus for any v which is loss-averse (see Deﬁnition 2.2) we havedd y W ( F y ) = 12 ( v ( − − y ) − v ( − y ) + v (1 + y ) − v ( y ))= − ˆ − y − − y v (cid:48) ( z ) d z + ˆ yy v (cid:48) ( z ) d z = ˆ yy (cid:0) v (cid:48) ( z ) − v (cid:48) ( − z ) (cid:1) d z ≤ . It is important to note that LASD is a concept that is specialized to the comparisonof distributions that represent gains and losses. Standard FOSD is typically applied to thedistribution of outcomes in levels without regard to whether the outcomes resulted from gainsor losses of agents relative to a pre-policy state — in our notation, G A and G B are typicallycompared with FOSD, instead of F A and F B . FOSD applied to post-policy levels may or maynot coincide with LASD applied to changes. This means that even when a strong conditionsuch as FOSD holds for ﬁnal outcomes, if one took into account how agents value gains andlosses it may turn out that the dominant distribution is no longer a preferred outcome. Onecould apply the FOSD rule to compare distributions of income changes, which implies LASDapplied to changes, because FOSD applies to a broader class of value functions. However, thistype of comparison would ignore agents’ loss aversion, the important qualitative feature thatLASD accounts for. The following example shows that the analysis of outcomes in levels usingFOSD need not correspond to any LASD ordering of outcomes in changes. Example 3.4.

Let Z represent outcomes before policies A or B . Suppose Z is distributeduniformly over { , , , } . Policy A assigns post-policy outcomes depending on the realized Z according to the schedule Z A =  if { Z = 0 } if { Z = 1 } if { Z = 2 } if { Z = 3 } . X A = Z A − Z is P { X A = − } = 1 / , P { X A = 1 } = P { X A =3 } = 1 / . Meanwhile, policy B maintains the status quo: X B = Z B − Z = 0 with probability1. It is straightforward to check that Z A ∼ Z B thus they dominate each other according toFOSD. However, there is no loss aversion-sensitive dominance between X A and X B . Indeed,we can ﬁnd two value functions that fulﬁll the conditions of Deﬁnition 2.4 but order X A and X B diﬀerently. For example, take v ( x ) = x . Then ´ v ( x ) dF A ( x ) > ´ v ( x ) dF B ( x ) = 0 .Next let v ( x ) = sgn ( x ) | x | / . Then − . ≈ ´ v ( x ) dF A ( x ) < ´ v ( x ) dF B ( x ) ≈ .In the previous example, policy B left pre-treatment outcomes unchanged, or in otherwords, maintained a status quo condition — we had X B = Z B − Z ≡ . Suppose generallythat X B has a distribution that is degenerate at . Then F B ( x ) = 0 for all x < and F B ( x ) = 1 for all x ≥ . We deﬁne this as a status quo policy distribution, labelled F SQ .When comparison is between a distribution F A and F SQ , LASD and standard FOSD areequivalent. Corollary 3.5.

Suppose that F A ∈ F and F B = F SQ . Then F A (cid:23) LASD F SQ ⇐⇒ F A (cid:23) F OSD F SQ . Remark 3.6.

Although in this paper we focus on the distribution of gains and losses, Kőszegiand Rabin (2006) have developed an interesting preference relation in which individuals deriveutility from income and also from gains and losses. In particular, their utility function isadditively separable in both gains and losses x and income levels z i.e. ˜ v ( x, z ) = v G ( x ) + v I ( z ) ,where x ∈ R and z ∈ [0 , ∞ ) . Using Kőszegi and Rabin (2006) preferences, policy A dominatespolicy B if, in our notation, (4) and (5) are satisﬁed by X A and X B along with the additionalcondition that Z A dominates Z B according to FOSD. A proof of this result is given in AppendixB. In many situations of interest the cumulative distribution functions of gains and losses, F A and F B , are not point identiﬁed without a model of the relationship between X A and X B .Without information on the dependence between potential outcomes, we can still make somemore circumscribed statements with regard to dominance based on bounds for the distributionfunctions.A number of authors have considered functions that bound the distribution functions F A and F B . Taking X A as an example, the Makarov bounds (Makarov, 1982; Rüschendorf, 1982;9rank, Nelsen, and Schweizer, 1987) are two functions L and U that satisfy L ( x ) ≤ F A ( x ) ≤ U ( x ) for all x ∈ R , depend only on the marginal distribution functions G and G A andare pointwise sharp — for any ﬁxed x there exist some Z ∗ and Z ∗ A such that the resulting X ∗ A = Z ∗ A − Z ∗ has a distribution function at x that is equal one of L ( x ) or U ( x ) . Williamsonand Downs (1990) provide convenient deﬁnitions for these bound functions. For any twodistribution functions G , G , deﬁne the maps L ( x, G , G ) = sup u ∈ R ( G ( u ) − G ( u − x )) U ( x, G , G ) = inf u ∈ R (1 + G ( u ) − G ( u − x )) . For convenience deﬁne the policy-speciﬁc bound functions for F k , k ∈ { A, B } and all x ∈ R ,which depend on the marginal CDFs G and G k , by L k ( x ) = L ( x, G , G k ) (6) U k ( x ) = U ( x, G , G k ) . (7)Using these deﬁnitions we obtain a suﬃcient and a necessary condition for LASD when onlybounds of the treatment eﬀects distribution are observable. The next theorem formalizes theresult. Theorem 3.7.

Suppose that G , G A , G B ∈ F and deﬁne the bounding functions using for-mulas (6) and (7) for k ∈ { A, B } .1. If for all x ≥ , L B ( − x ) − U A ( − x ) ≥ max { , U A ( x ) − L B ( x ) } (8) then (3) holds.2. If (3) holds then for all x ≥ , U B ( − x ) − L A ( − x ) ≥ L A ( x ) − U B ( x ) . (9)Theorem 3.7 is and extension of Theorem 3.1 from the point-identiﬁed to the partially-identiﬁed case. Both Theorems 3.1 and 3.7 will play important parts in the inference proce-dures discussed in the next Section.When the comparison is with the status quo distribution, the partially identiﬁed conditionssimplify. Corollary 3.8 below is an extension of Corollary 3.5 to the partially identiﬁed case. Corollary 3.8.

Suppose that F B = F SQ and that G , G A ∈ F . Deﬁne the bound functions A and L A using formulas (6) and (7). Then U A ( − x ) = 0 for all x ≥ ⇒ F A (cid:23) LASD F SQ and F A (cid:23) LASD F SQ ⇒ L A ( − x ) = 0 for all x ≥ . In this section we propose statistical inference methods for the loss aversion-sensitive domi-nance (LASD) criteria discussed in previous sections. We consider the null and alternativehypotheses H : F A (cid:23) LASD F B H : F A (cid:54)(cid:23) LASD F B . (10)Under the null hypothesis (10) policy A dominates B in the LASD sense, similar to much of theliterature on stochastic dominance — see, for example, Linton, Maasoumi, and Whang (2005);Linton, Song, and Whang (2010). We use the dominance criteria discussed in Theorems 3.1and 3.7 to design nonparametric tests for H . Because the LASD hypothesis is translatedinto functional inequalities, which we discuss below, tests must be conducted uniformly overall x ≥ . This uniformity in x and features of the LASD conditions present a challenge forinference.We consider tests for this null hypothesis given sample data observed under two diﬀer-ent identiﬁcation assumptions. We start with the case where the econometrician can directlyobserve samples { X Ai } n A i =1 and { X Bi } n B i =1 which represent agents’ gains and losses. In otherwords, we assume that a model has been imposed on the data so that the distribution func-tions of X A and X B are point-identiﬁed and their distribution functions can be estimatedusing the empirical distribution functions from two samples. Next we extend these resultsto the partially-identiﬁed case where no assumption about the joint distribution of potentialoutcomes under either treatment is assumed. In this case, the econometrician observes threesamples, { Z i } n i =1 , { Z Ai } n A i =1 and { Z Bi } n B i =1 , of outcomes under a control or pre-policy state,and outcomes under policies A and B , and tests are based on plug-in estimates for bounds for X A = Z A − Z and X B = Z B − Z .We consider distribution functions as members of the space of bounded functions on thesupport X ⊆ R , denoted (cid:96) ∞ ( X ) , equipped with the supremum norm, deﬁned for f : R k → R (cid:96) by (cid:107) f (cid:107) ∞ = max j { sup x ∈ R k | f j ( x ) |} . For real numbers x let ( x ) + = max { , x } . Given asequence of bounded functions { f n } n and limiting random element f we write f n (cid:59) f todenote weak convergence in ( (cid:96) ∞ , (cid:107) · (cid:107) ∞ ) in the sense of Hoﬀman-Jørgensen (van der Vaart andWellner, 1996). 11 .1 Inferring dominance from point identiﬁed treatment distributions In this subsection we suppose that the pair of marginal distribution functions F = ( F A , F B ) is identiﬁed. To implement a test of the hypotheses (10) we employ the results of Theorem 3.1 to constructmaps of F into criterion functions that are used to detect deviations from the hypothesis H .Speciﬁcally, recalling that ( x ) + = max { , x } , for the point-identiﬁed case we examine maps T : ( (cid:96) ∞ ( R )) → (cid:96) ∞ ( R + ) and T : ( (cid:96) ∞ ( R )) → ( (cid:96) ∞ ( R + )) , deﬁned for each x ≥ by T ( F )( x ) = ( F A ( x ) − F B ( x )) + + F A ( − x ) − F B ( − x ) (11)and T ( F )( x ) = (cid:34) F A ( − x ) − F B ( − x ) F A ( x ) − F B ( x ) + F A ( − x ) − F B ( − x ) (cid:35) . (12)Functions T ( F ) and T ( F ) are designed so that large positive values will indicate a violationof the null. Taking T as an example, Theorem 3.1 states that W ( F A ) ≥ W ( F B ) if and only if F B ( − x ) − F A ( − x ) ≥ ( F A ( x ) − F B ( x )) + for all x ≥ , so tests can be constructed by lookingfor x where T ( F )( x ) becomes signiﬁcantly positive. We will refer to T j as maps from pairs ofdistribution functions to another function space, and also refer to them as functions.The hypotheses (10) can be rewritten in two equivalent forms, depending on whether oneuses T or T to transform distribution functions: letting X ⊆ R + be an evaluation set, wehave H (1)0 : T ( F )( x ) ≤ , for all x ∈ X ,H (1)1 : T ( F )( x ) > , for some x ∈ X (13)and H (2)0 : T ( F )( x ) ≤ , for all x ∈ X ,H (2)1 : T ( F )( x ) (cid:54)≤ , for some x ∈ X . (14)In the second set of hypotheses is a two-dimensional vector of zeros and inequalities aretaken coordinate-wise.The next step in testing the hypotheses (13) and (14) is to estimate T ( F ) and T ( F ) . Let F n = ( F An , F Bn ) denote the pair of marginal empirical distribution functions, that is, F kn ( x ) = n k (cid:80) n k i =1 { X ki ≤ x } for k ∈ { A, B } . These are well-behaved estimators of the components of12 . Letting n = n A + n B , standard empirical process theory shows that √ n ( F n − F ) convergesweakly to a Gaussian process under weak assumptions (van der Vaart, 1998, Example 19.6).In order to conduct inference for loss aversion-sensitive dominance, we use plug-in estimators T j ( F n ) for j ∈ { , } . See Remark A.7 in Appendix A for details on the computation of thesefunctions.In order to detect when T j ( F n ) is signiﬁcantly positive, we consider statistics based ona one-sided supremum norm or a one-sided L norm over X . Kolmogorov-Smirnov (i.e.,supremum norm) type statistics are V n = √ n sup x ∈X ( T ( F n )( x )) + (15) V n = √ n max (cid:26) sup x ∈X ( T ( F n )( x )) + , sup x ∈X ( T ( F n )( x )) + (cid:27) . (16)Meanwhile Cramér-von Mises (or L norm) test statistics are deﬁned by W n = √ n (cid:18) ˆ X (cid:0) ( T ( F n )( x )) + (cid:1) d x (cid:19) / , (17) W n = √ n (cid:18) ˆ X (cid:0) ( T ( F n )( x )) + (cid:1) + (cid:0) ( T ( F n )( x )) + (cid:1) d x (cid:19) / . (18)In the sequel, we assume that all functions used in L statistics are square-integrable. We wish to establish the limiting distributions of V jn and W jn , for j ∈ { , } , under the nullhypothesis H : F A (cid:23) LASD F B . This means that we are concerned with the behavior of theempirical criterion function processes √ n ( T j ( F n ) − T j ( F )) , which are random functions.Two challenges arise when considering these criterion function processes. First, the formof the null hypothesis as a functional inequality to be tested uniformly over X is a source ofirregularity. The assumption that the distribution P satisﬁes the null hypothesis F A (cid:23) LASD F B implies that the asymptotic distributions of W j and V j depend on features of P . Thisis referred to as non-uniformity in P in (Linton, Song, and Whang, 2010; Andrews and Shi,2013), and requires attention when resampling.Second, due to the pointwise maximum function in its deﬁnition, T is too irregular asa map from the data to the space of bounded functions to establish a limiting distributionfor the empirical process √ n ( T ( F n ) − T ( F )) using conventional statistical techniques. In13ontrast, T is a linear map of F , which implies that √ n ( T ( F n ) − T ( F )) has a well-behavedlimiting distribution in ( (cid:96) ∞ ( R + )) .Despite the above challenges, we show that V jn and W jn (for j ∈ { , } ) have well-behavedasymptotic distributions, and furthermore, that the limiting random variables satisfy V ∼ V and W ∼ W . This is an important result because it is the foundation for applying bootstraptechniques for inference. Before stating the formal assumptions and asymptotic properties ofthe tests, we discuss the two diﬃculties mentioned above in more detail.The limiting distributions of V jn and W jn statistics depend on features of the joint prob-ability distribution of ( X A , X B ) , which we denote by P . Let P be the set of distributions P such that F A (cid:23) LASD F B . These are distributions with marginal distribution functions F suchthat T j ( F )( x ) ≤ for all x ≥ . To discuss the relationship between these sets of distributionsand test statistics, we relabel the two coordinates of the T function as m ( x ) = F A ( − x ) − F B ( − x ) (19)and m ( x ) = F A ( − x ) − F B ( − x ) + F A ( x ) − F B ( x ) . (20)When P ∈ P , both m ( x ) ≤ and m ( x ) ≤ for all x ≥ .More detail is required about the behavior of the two coordinate functions to determinethe limiting distributions of V jn and W jn statistics. For L -norm statistics W n and W n , wedeﬁne the following relevant subdomains of X , which collect the arguments where m or m are equal to zero: X ( P ) = { x ∈ X : m ( x ) = 0 } (21) X ( P ) = { x ∈ X : m ( x ) = 0 } . (22)Denote X ( P ) ⊆ X as the set of x where T ( F )( x ) = 0 or at least one coordinate of T ( F ) equals for probability distribution P . As will be seen below, X ( P ) is the same for both the T and T functions. Following Linton, Song, and Whang (2010), we call X ( P ) the contactset for the distribution P . Given the above deﬁnitions, under the null hypothesis we can write X ( P ) = X ( P ) ∪ X ( P ) . On the other hand, the supremum-norm statistics V n and V n need a diﬀerent family of sets,14amely the sets of (cid:15) -maximizers of m and m . For any (cid:15) ≥ and k ∈ { , } , let M k ( (cid:15) ) = (cid:26) x ∈ X : m k ( x ) ≥ sup x ∈X m k ( x ) − (cid:15) (cid:27) . (23)An important subset of P are those P for which test statistics have nontrivial limitingdistributions under the null hypothesis — that is, not degenerate at 0, which occurs whenthere is some x such that T j ( F )( x ) = 0 (note that there are no x such that T j ( F )( x ) > when P ∈ P ). Deﬁne P ⊂ P to be the set of all P such that X ( P ) (cid:54) = ∅ . If P ∈ P \P then X ( P ) = ∅ and because the distribution satisﬁes the null hypothesis, F A strictly dominates F B everywhere and the criterion functions T j are strictly negative over X . When P ∈ P \P ,test statistics have asymptotic distributions that are degenerate at zero because test statisticswill detect that policy A is strictly better that B over all of X . When P ∈ P , T j ( F ) is zeroover X ( P ) and test statistics have a nontrivial asymptotic distribution over X ( P ) . Thus,when F A (cid:23) LASD F B , the asymptotic behavior of test statistics depends on whether P ∈ P or P ∈ P \P . Note that when P ∈ P , we have lim (cid:15) (cid:38) M k ( (cid:15) ) = X k ( P ) for whichevercoordinate function actually achieves the maximal value zero.Hadamard diﬀerentiability is an analytic tool used to establish the asymptotic distributionof nonlinear maps of the empirical process. Deﬁnition A.1 in Appendix A provides a precisestatement of the concept. When a map is Hadamard diﬀerentiable — for example T , which islinear as a map from ( (cid:96) ∞ ( R )) to ( (cid:96) ∞ ( R + )) and is thus trivially diﬀerentiable — the functionaldelta method can be applied to describe its asymptotic behavior as a transformed empiricalprocess, and a chain rule makes the analysis of compositions of several Hadamard-diﬀerentiablemaps tractable. Also, the Hadamard diﬀerentiability of a map implies resampling is consistentwhen this map is applied to the resampled empirical process (van der Vaart, 1998, Theorem23.9) — so, for example, the distribution of resampled criterion processes √ n ( T ( F ∗ n ) − T ( F n )) is a consistent estimate of the asymptotic distribution of √ n ( T ( F n ) − T ( F )) in the space (cid:96) ∞ ( R + ) . On the other hand, consider the T map. The pointwise Hadamard directionalderivative of T ( f )( x ) at a given x ≥ in direction h ( x ) = ( h A ( x ) , h B ( x )) is T (cid:48) f ( h )( x ) =  h A ( x ) − h B ( x ) + h A ( − x ) − h B ( − x ) , f A ( x ) > f B ( x )( h A ( x ) − h B ( x )) + + h A ( − x ) − h B ( − x ) , f A ( x ) = f B ( x ) h A ( − x ) − h B ( − x ) , f A ( x ) < f B ( x ) . (24)This map, thought of as a map between function spaces, ( (cid:96) ∞ ( R )) and (cid:96) ∞ ( R + ) , is not dif-ferentiable because the pointwise maximum map is only diﬀerentiable at each point x , butnot in the codomain (cid:96) ∞ ( R + ) . Despite the lack of diﬀerentiability of the map F (cid:55)→ T ( F ) , we15how in Lemma A.3 in Appendix A that F (cid:55)→ V and F (cid:55)→ W are Hadamard directionallydiﬀerentiable, which implies these maps are just regular enough that existing statistical meth-ods can be applied to their analysis. Later in this section we apply the resampling techniquerecently developed in Fang and Santos (2019) along with this directional diﬀerentiability totest hypotheses using V n or W n .Having discussed the diﬃculties in the relationship between distributions and test statistics,we turn to assumptions on the observations. In order to conduct inference using either T ( F n ) or T ( F n ) we make the following assumptions. A1 The observations { X Ai } n A i =1 and { X Bi } n B i =1 are iid samples and independent of each otherand are continuously distributed with marginal distribution functions F A and F B re-spectively. A2 Let the sample sizes n A and n B increase in such a way that n k / ( n A + n B ) → λ k as n A , n B → ∞ , where < λ k < for k ∈ { A, B } . Deﬁne n = n A + n B .Under these assumptions we establish the asymptotic properties of the test statistics underthe null and ﬁxed alternatives. Under the above assumptions, there is a Gaussian process G F such that √ n ( F n − F ) (cid:59) G F . We denote each coordinate process G F A and G F B , and forconvenience deﬁne two transformed processes: for each x ≥ let G ( x ) = G F A ( − x ) − G F B ( − x ) (25) G ( x ) = G F A ( x ) − G F B ( x ) − G F A ( − x ) + G F B ( − x ) . (26)These will be used in the theorem below. Theorem 4.1.

Make assumptions A1 - A2 . Deﬁne the limiting Gaussian processes G and G as above. Then:1. Suppose that P ∈ P . As n → ∞ , V n (cid:59) V and W n (cid:59) W , where V ∼ max (cid:40) , sup x ∈X ( P ) G ( x ) · (cid:26) sup x ∈X m ( x ) = 0 (cid:27) , sup x ∈X ( P ) G ( x ) · (cid:26) sup x ∈X m ( x ) = 0 (cid:27)(cid:41) and W ∼ (cid:32) ˆ X ( P ) (cid:0) ( G ( x )) + (cid:1) d x + ˆ X ( P ) (cid:0) ( G ( x )) + (cid:1) d x (cid:33) / .

2. Suppose that P ∈ P . As n → ∞ , V n (cid:59) V and W n (cid:59) W , where V ∼ V and W ∼ W . . Suppose that P ∈ P \P for j = 1 or . As n → ∞ , P { V jn > (cid:15) } → and P { W jn > (cid:15) } → for all (cid:15) > .4. Suppose that P (cid:54)∈ P . As n → ∞ , P { V jn > c } → and P { W jn > c } → for all c ≥ for j = 1 or . Theorem 4.1 derives the asymptotic properties of the proposed test statistics. Parts 1 and 2establish the weak limits of V jn and W jn for j ∈ { , } when the null hypothesis is true. Recallthat when P ∈ P , lim (cid:15) (cid:38) M k ( (cid:15) ) = X k ( P ) , which is why M k ( (cid:15) ) terms are absent in the ﬁrstpart of the theorem. Remarkably, the test statistics using T and T criterion processes havethe same asymptotic behavior despite the diﬀerent appearances of the underlying processesand the irregularity of T . Part 3 shows that the statistics are asymptotically degenerate atzero when the contact set is empty, that is, when P lies on the interior of the null region. Part4 shows that the test statistics diverge when data comes from any distribution that does notsatisfy the null hypothesis.The limiting distributions described in Part 1 of Theorem 4.1 are not standard because thedistributions of the test statistics depend on features of P through the X ( P ) terms in eachexpression. Therefore, to make practical inference feasible, we suggest the use of resamplingtechniques below. The proposed test statistics have complex limiting distributions. In this subsection, we presentresampling procedures to estimate the limiting distributions of both V jn and W jn for j ∈ { , } under the assumption that P ∈ P . Naive use of bootstrap data generating processes in theplace of the original empirical process suﬀers from distortions due to discontinuities in thedirectional derivatives of the maps that deﬁne the distributions of the test statistics. In ﬁnitesamples the plug-in estimate will not ﬁnd, for example, the region where F A ( x ) − F B ( x ) = 0 ,where the derivatives exhibit discontinuous behavior. Our procedure involves making estimatesof the derivatives involved in the limiting distribution and a standard exchangeable bootstraproutine, as proposed in Fang and Santos (2019). In order to estimate contact sets, deﬁne a sequence of constants { a n } such that a n (cid:38) and √ na n → ∞ and let ˆ m n ( x ) = F An ( − x ) − F Bn ( − x ) and ˆ m n ( x ) = F An ( − x ) − F Bn ( − x ) + Given a set of weights { W i } ni =1 that sum to one and are independent of { X i } ni =1 , the exchangeable bootstrapmeasure is a randomly-weighted measure that puts mass W i at observed sample point X i for each i . Thisencompasses, for example, the standard bootstrap, m -of- n bootstrap and wild bootstrap. An ( x ) − F Bn ( x ) . Then for W j statistics deﬁne the estimated contact sets by ˆ X = { x ∈ X : | ˆ m n ( x ) | ≤ a n } (27) ˆ X = { x ∈ X : | ˆ m n ( x ) | ≤ a n } . (28)When both sets are empty, replace both estimates by X . Meanwhile, for V j statistics deﬁneestimated (cid:15) -maximizer sets. For any sequence of constants { b n } such that b n (cid:38) and √ nb n →∞ , let ˆ M ( b n ) = { x ∈ X : ˆ m n ( x ) ≥ max ˆ m n ( x ) − b n } , (29) ˆ M ( b n ) = { x ∈ X : ˆ m n ( x ) ≥ max ˆ m n ( x ) − b n } . (30)Using these estimates, the distributions of V and W can be estimated from sample data(recall that Part 2 of Theorem 4.1 asserts that these are the same distributions as those of V and W ). The formulas in part 3 of the steps below are obtained by inserting estimatedcontact sets and resampled empirical processes in the place of population-level quantities intothe functions shown in Part 1 of Theorem 4.1. Resampling routine to estimate the distributions of V jn and W jn for j = 1 , :

1. If using a Cramér-von Mises statistic, given a sequence of constants { a n } , estimate thecontact sets ˆ X and ˆ X . If using a Kolmogorov-Smirnov statistic, given a sequence ofconstants { b n } , estimate the b n -maximizer sets of ˆ m n and ˆ m n .Next repeat the following two steps for r = 1 , . . . , R :2. Construct the resampled processes F ∗ r n ( x ) = √ n (cid:16) F ∗ An ( − x ) − F ∗ Bn ( − x ) − F An ( − x ) + F Bn ( − x ) (cid:17) F ∗ r n ( x ) = √ n (cid:16) F ∗ An ( − x ) − F ∗ Bn ( − x ) − F An ( − x ) + F Bn ( − x )+ F ∗ An ( x ) − F ∗ Bn ( x ) − F An ( x ) + F Bn ( x ) (cid:17) using an exchangeable bootstrap.3. Calculate the resampled test statistic. Letting ˆ k = argmax k { sup x ≥ ˆ m kn ( x ) } and { c n } (cid:38) satisfy √ nc n → ∞ , calculate V ∗ rn = (cid:16) max x ∈ ˆ M ˆ k ( b n ) F ∗ r ˆ kn ( x ) (cid:17) + | max ˆ m n − max ˆ m n | > c n max (cid:110) , max x ∈ ˆ M ( b n ) F ∗ r n ( x ) , max x ∈ ˆ M ( b n ) F ∗ r n ( x ) (cid:111) | max ˆ m n − max ˆ m n | ≤ c n (31)or W ∗ rn = (cid:32) ˆ ˆ X (cid:0) ( F ∗ r n ( x )) + (cid:1) d x + ˆ ˆ X (cid:0) ( F ∗ r n ( x )) + (cid:1) d x (cid:33) / . (32)Finally,4. Let ˆ q V ∗ (1 − α ) and ˆ q W ∗ (1 − α ) be the (1 − α ) th sample quantile from the bootstrapdistributions of { V ∗ rn } Rr =1 or { W ∗ rn } Rr =1 , respectively, where α ∈ (0 , is the nominal sizeof the tests. Reject the null hypothesis (13) or (14) if V jn and W jn deﬁned in (15)-(18)are, respectively, larger than ˆ q V ∗ (1 − α ) or ˆ q W ∗ (1 − α ) .The resampled statistics are calculated by imposing the null hypothesis and assuming thatthe region X j ( P ) is the only part of the domain that provides a nondegenerate contributionto the asymptotic distribution of the statistic under the null. The two cases of each partin the maximum arise from trying to impose the null behavior on the resampled supremumnorm statistics, even when it appears the null is violated based on the value of the samplestatistic. A simple alternative way to conduct inference would be to assume the least-favorablenull hypothesis that F A ≡ F B , and to resample using all of X . However, this may result intests with lower power (Linton, Song, and Whang, 2010) — power loss arises in situationswhere X ( P ) ⊂ X (strictly), so that the T j process is only nondegenerate on a subset, whilebootstrapped processes that assume X ( P ) = X would look over all of X and result in astochastically larger bootstrap distribution than the true distribution.The next result shows that our tests based on the resampling schemes described abovehave accurate size under the null hypothesis. In order to metrize weak convergence we usetest functions from the set BL , which denotes Lipschitz functions R → R that have constant1 and are bounded by 1. Theorem 4.2.

Make assumptions A1 - A2 and suppose that P ∈ P . Let ˆ q V ∗ j (1 − α ) and ˆ q W ∗ j (1 − α ) be the (1 − α ) th sample quantile from the bootstrap distributions as described in theroutines above. Then for j = 1 , , the bootstrap is consistent: sup f ∈ BL | E [ f ( V ∗ n ) | X ] − E [ f ( V )] | = o P (1) nd sup f ∈ BL | E [ f ( W ∗ n ) | X ] − E [ f ( W )] | = o P (1) , where V and W are deﬁned in Theorem 4.1. The result in above theorem is stated in terms of the limiting variables V and W andbootstrap analogs. V and W , using the functional delta method, are Hadamard directionalderivatives of a chain of maps from the marginal distribution functions F to the real line, andthe derivatives are most compactly expressed as the deﬁnitions in Theorem 4.1.The bootstrap variables combine conventional resampling with ﬁnite-sample estimates ofthe maps deﬁned in Part 1 of Theorem 4.1, which is a resampling approach proposed inFang and Santos (2019). Their result is actually more general — it states that with a moreﬂexible estimator V ∗ n , we would obtain bootstrap consistency for P in the null and alternativeregions. Because our focus is on testing F A (cid:23) LASD F B , however, our resampling scheme,and Theorem 4.2, are done under the imposition of the null hypothesis. The resamplingconsistency result in Theorem 4.2 implies that our bootstrap tests have asymptotically correctsize uniformly over probability distributions in the null region, in the same sense as was stressedin Linton, Song, and Whang (2010). A formal statement of this uniformity over P is givenin Theorem A.5 in Appendix A. Along with Part 4 of Theorem 4.1 Theorem A.5 additionallyimplies that our tests are consistent, that is, that their power to detect violations from thenull represented by ﬁxed alternative distributions tends to one. This is because the resamplingscheme produces asymptotically bounded critical values, while the test statistics diverge underthe alternative. In this section we extend dominance tests to the case that distribution functions F A and F B are only partially identiﬁed by their Makarov bounds. Suppose that Z , Z A and Z B are random variables with marginal distribution functions G = ( G , G A , G B ) , but the jointprobability distribution P of the vector ( Z , Z A , Z B ) is unknown, so that F A and F B are notpoint identiﬁed because they are the unknown distribution functions of X A = Z A − Z and X B = Z B − Z . Nevertheless, we wish to test the hypotheses in (10), which depend on F A and F B . 20 .2.1 Test statistics Recall equations (8) and (9) from Section 3. Restated in terms of the null hypothesis F A (cid:23) LASD F B , condition (8) is suﬃcient to imply the null hypothesis is true, while (9) represents anecessary condition for dominance. Denote by P suf the set of distributions that satisfy (8)and let P nec collect all distributions that satisfy (9). Then still using the label P for the setof distributions such that X A dominates X B , we have the (strict) inclusions P suf ⊂ P ⊂P nec . Given this relation, without any further identiﬁcation conditions, we look for signiﬁcantviolations of the necessary condition, since P / ∈ P nec implies

P / ∈ P . This generally results inconservative tests because distributions P ∈ P nec \P will also not be rejected, but it avoidsoverrejection, which would be the result when using the suﬃcient condition.To test the null (10) we employ the inequality speciﬁed in equation (9) from Theorem 3.7.For each x ∈ X let T ( G )( x ) = L A ( − x ) + L A ( x ) − U B ( − x ) − U B ( x ) , (33)where L A and U B are deﬁned in (6) and (7). To see the explicit dependence of T on G ,rewrite (33), using the identity inf f = − sup( − f ) in the deﬁnition of U B as T ( G )( x ) = sup u ∈ R ( G A ( u ) − G ( u + x )) + sup u ∈ R ( G A ( u ) − G ( u − x )) − u ∈ R ( G ( u + x ) − G B ( u )) + sup u ∈ R ( G ( u − x ) − G B ( u )) . (34)As before, T has been written in such a way that a violation of the null hypothesis F A (cid:23) LASD F B is indicated by observing some x such that T ( G )( x ) > .The above map shares a similar feature with the T map in the previous section — themarginal (in u ) optimization maps are directionally diﬀerentiable at each point x ≥ , but f ( u, x ) (cid:55)→ sup u f ( u, x ) is not Hadamard diﬀerentiable as a map from (cid:96) ∞ ( R ×X ) to (cid:96) ∞ ( X ) . Onesolution to this problem is to examine the distribution of test functionals applied to the process,which are Hadamard directionally diﬀerentiable (shown in Lemma A.4 in Appendix A).Given observed samples { Z ki } for k ∈ { , A, B } , deﬁne the marginal empirical distributionfunctions G n = ( G n , G An , G Bn ) , where G kn ( z ) = n k (cid:80) i { Z ki ≤ z } for k ∈ { , A, B } , and let L An and U Bn be the plug-in estimates of the bounds: for each x ∈ X , let L An ( x ) = L ( x, G n , G An ) U Bn ( x ) = U ( x, G n , G Bn ) , L and U were introduced in equations (6) and (7). To estimate T in (33)we use the plug-in estimate T ( G n ) . As in the previous section, we consider the followingKolmogorov-Smirnov and Cramér-von Mises type test statistics: V n = √ n sup x ∈X ( T ( G n )( x )) + (35) W n = √ n (cid:18) ˆ X (cid:0) ( T ( G n )( x )) + (cid:1) d x (cid:19) / . (36)The next subsections establish limiting distributions for V n and W n and suggest a resamplingprocedure to estimate the distributions. Once again, it is necessary to deﬁne the region where the test statistics have nontrivial distri-butions. Deﬁne the contact set for the T criterion function by X nec ( P ) = { x ∈ X : L A ( − x ) + L A ( x ) − U B ( − x ) − U B ( x ) = 0 } . We say that distribution P ∈ P nec when X nec ( P ) (cid:54) = ∅ . As mentioned at the beginning ofthe section, P nec is not the set of P such that F A (cid:23) LASD F B , rather those that satisfy thisnecessary condition, or in other words, P ⊂ P nec . There is no obvious connection between P and P nec — the P in P nec are simply those that lead to nontrivial asymptotic behavior of the T statistic, as will be shown in Theorem 4.3. Next, we deﬁne a few functions that are analogous tothe m and m used in the point-identiﬁed case, and which come from separating equation 34into four sub-functions. Let m ( u, x ) = G A ( u ) − G ( u + x ) , m ( u, x ) = G A ( u ) − G ( u − x ) , m ( u, x ) = G B ( u ) − G ( u + x ) and m ( u, x ) = G B ( u ) − G ( u − x ) . These functions are usedto deﬁne, for k = 1 , . . . , for any x ∈ X and (cid:15) ≥ , the set-valued maps M k ( x, (cid:15) ) = (cid:26) u ∈ R : m k ( u, x ) ≥ sup u ∈ R m k ( u, x ) − (cid:15) (cid:27) . (37)Also for the supremum norm statistic another relevant set of (cid:15) -maximizers exists: for any (cid:15) ≥ , let M nec ( (cid:15) ) = (cid:40) ( u, x ) ∈ R × X : (cid:88) k =1 m k ( u, x ) ≥ sup u,x (cid:88) k =1 m k ( u, x ) − (cid:15) (cid:41) . (38)Under the null hypothesis that the supremum is zero, lim (cid:15) (cid:38) M nec ( (cid:15) ) = X nec , as seen in theexpression for V in the next theorem. 22ow we turn to regularity assumptions on the observed data. The only diﬀerence betweenthese assumptions and assumptions A1 - A2 is that we must now make assumptions for threesamples instead of two. B1 The observations { Z i } n i =1 , { Z Ai } n A i =1 and { Z Bi } n B i =1 are iid samples and independent ofeach other and are continuously distributed with marginal distribution functions G , G A and G B respectively. B2 The sample sizes n , n A and n B increase in such a way that n k / ( n + n A + n B ) → λ k as n , n A , n B → ∞ , for k ∈ { , A, B } , where < λ k < . Let n = n + n A + n B .Before stating the next theorem, it is convenient to make some deﬁnitions. Under as-sumptions B1 - B2 , standard results in empirical process theory show that there is a Gaussianprocess G G such that √ n ( G n − G ) (cid:59) G G (van der Vaart, 1998, Example 19.6). For each ( u, x ) ,denote the transformed empirical processes and their (Gaussian) limits √ n ( G An ( u ) − G n ( u + x ) − G A ( u ) + G ( u + x )) = G n ( u, x ) (cid:59) G ( u, x ) √ n ( G An ( u ) − G n ( u − x ) − G A ( u ) + G ( u − x )) = G n ( u, x ) (cid:59) G ( u, x ) √ n ( G n ( u + x ) − G Bn ( u ) − G ( u + x ) + G B ( u )) = G n ( u, x ) (cid:59) G ( u, x ) √ n ( G n ( u − x ) − G Bn ( u ) − G ( u − x ) + G B ( u )) = G n ( u, x ) (cid:59) G ( u, x ) (39)Given the above and deﬁnitions, the asymptotic behavior of V n and W n can be estab-lished. Theorem 4.3.

Under assumptions B1 - B2 :1. Suppose that P ∈ P nec . As n → ∞ , V n (cid:59) V and W n (cid:59) W , where, given thedeﬁnitions (39) and (37) , V = (cid:32) sup x ∈X nec ( P ) 4 (cid:88) k =1 lim (cid:15) (cid:38) sup u ∈M k ( x,(cid:15) ) G k ( u, x ) (cid:33) + and W =  ˆ X nec ( P ) (cid:32) (cid:88) k =1 lim (cid:15) (cid:38) sup u ∈M k ( x,(cid:15) ) G k ( u, x ) (cid:33) +  d x  / .

2. Suppose that P ∈ P nec \P nec . Then as n → ∞ , P { V > (cid:15) } → and P { W > (cid:15) } → for all (cid:15) > . . Suppose that P / ∈ P nec . Then as n → ∞ , P { V > c } → and P { W > c } → for all c ≥ . The results of this theorem parallel those in Theorem 4.1. The distributions of these teststatistics are complex. Therefore a consistent resampling procedure for inference is discussedin the next subsection. The conservatism of these tests is reﬂected in the second part above.There may be

P / ∈ P such that P ∈ P nec \P nec , meaning the test will not detect that thisdistribution violates the hypothesis that F A (cid:23) LASD F B . Now we turn to the issue of conducting practical inference using estimated bound functions andthe necessary condition for LASD. As before, resampling can be implemented by estimatingthe derivatives of either V or W . These estimates represent the major diﬀerence from theresampling scheme developed in the point identiﬁed setting.The estimates required for tests based on V n and W n are similar to those used in thepoint-identiﬁed case. Deﬁne a grid of values X ⊂ R and let X + be the sub-grid of nonnegativepoints such that X + ⊂ X . For a sequence a n such that a n (cid:38) and √ na n → ∞ , deﬁne theestimate of the contact set ˆ X nec = (cid:8) x ∈ X + : | L An ( − x ) + L An ( x ) − U Bn ( − x ) − U Bn ( x ) | ≤ a n (cid:9) . (40)When this estimated set is empty, set ˆ X nec = X + . The inner maximization step that occurs inthe deﬁnition of the test statistics requires an estimate of the (cid:15) -maximizers of each sub-process,that is, estimates of (37) for k = 1 , . . . . For these sets we also use the same sort of estimator:for { b n } such that b n (cid:38) and √ nb n → ∞ , for each x ∈ X + let ˆ M k ( x ) = (cid:26) u ∈ X : ˆ m kn ( u, x ) ≥ max u ∈ X ˆ m kn ( u, x ) − b n (cid:27) (41)where the ˆ m kn are plug-in estimators of m k . Finally, for a sequence d n such that d n (cid:38) and √ nd n → ∞ , deﬁne the estimator ˆ M nec = (cid:40) ( u, x ) ∈ X × X + : (cid:88) k =1 ˆ m kn ( u, x ) ≥ max ( u,x ) ∈ X × X + (cid:88) k =1 ˆ m kn ( u, x ) − d n (cid:41) . (42)Putting these estimates together, we ﬁnd the derivative estimates described in the resampling Otherwise these functions would need to be evaluated over a prohibitive number of points in the support.

Resampling routine to estimate the distributions of V n and W n

1. If using a Cramér-von Mises statistic, given a sequence of constants { a n } , estimate thecontact set ˆ X nec . If using a Kolmogorov-Smirnov statistic, given sequences of constants { b n } and { d n } , estimate ˆ M k ( · ) for k = 1 , . . . and ˆ M nec .Next repeat the following two steps for r = 1 , . . . , R :3. Construct the resampled processes G ∗ kn = √ n ( G ∗ kn − G kn ) using an exchangeable boot-strap.4. Calculate the resampled test statistic V ∗ r n = (cid:32) max x ∈ ˆ M nec (cid:88) k =1 max u ∈ ˆ M k ( x ) G ∗ kn ( u, x ) (cid:33) + or W ∗ r n =  ˆ ˆ X nec (cid:32) (cid:88) k =1 max u ∈ ˆ M k ( x ) G ∗ kn ( u, x ) (cid:33) +  d x  / . Finally,6. Let ˆ q V ∗ (1 − α ) and ˆ q W ∗ (1 − α ) be the (1 − α ) th sample quantile from the bootstrapdistributions of { V ∗ r n } Rr =1 or { W ∗ r n } Rr =1 , respectively, where α ∈ (0 , is the nominalsize of the tests. We reject the null hypothesis (13) if V n and W n deﬁned in (35) or (36)are, respectively, larger than ˆ q V ∗ (1 − α ) or ˆ q W ∗ (1 − α ) .The following theorem guarantees that the resampling scheme is consistent. Theorem 4.4.

Make assumptions B1 - B2 and suppose that P ∈ P nec . Let ˆ q V ∗ (1 − α ) and ˆ q W ∗ (1 − α ) be the (1 − α ) th sample quantile from the bootstrap distributions as described in theroutines above. Then the bootstrap is consistent: sup f ∈ BL | E [ f ( V ∗ n ) | X ] − E [ f ( V )] | = o P (1) and sup f ∈ BL | E [ f ( W ∗ n ) | X ] − E [ f ( W )] | = o P (1) . P ∈ P nec . The testing procedure based on the T criterion function controls size uniformly over P nec , a superset of P . The uniform size ofthe resampling inference scheme over P nec is stated formally in Theorem A.6 in Appendix A.However, using only a necessary condition for inference comes at a cost, which is the possibilityof trivial power against some alternative P / ∈ P . For any P ∈ P nec \P , the probability ofrejecting the null is also less than or equal to α . More generally, results about size and poweragainst various alternatives that can be speciﬁed for point identiﬁed distributions are notavailable for the partially identiﬁed case. On the other hand, it is remarkable that the testcontrols size uniformly over the set P , which is a set of treatment outcome distributions thatcannot be observed directly.An Online Supplemental Appendix provides Monte Carlo numerical evidence of the ﬁnitesample properties of both point- and partially-identiﬁed methods. The simulations show thattests have empirical size close to the nominal, and high power against selected alternatives. In this section we illustrate the use of our proposed methods in a policy evaluation context. Wecontrast our results with a classical stochastic dominance approach. We use household-leveldata from an experimental evaluation of two federal assistance programs, named Aid to Fami-lies with Dependent Children (AFDC) and Jobs First (JF), to analyze the distributional eﬀectsof the policies. Bitler, Gelbach, and Hoynes (2006) use these data to document substantialheterogeneity in the impacts of this policy change on recipients’ total incomes. The authorsfocus on this policy because of the availability of experimental data, which provides a clearsource of identiﬁcation. Amongst its main ﬁndings, the article shows that this heterogeneitygenerated income gains and losses in diﬀerent, sizable groups of recipients.AFDC was one of the largest federal assistance programs in the United States between 1935and 1996. It consisted of a means-tested income support scheme for low-income families withdependent children, administered at the state level and funded at the federal level. Followingcriticism that this program discouraged female labor market participation and perpetuatedwelfare dependency, AFDC was discontinued in 1996 and replaced, in each state, by more Bitler, Gelbach, and Hoynes (2006) conduct a test comparing features of households before random assign-ment and ﬁnd that they do not diﬀer signiﬁcantly in terms of observable characteristics. We check additionallythat the income distributions were the same before the experiment split households among the two policies.We use a conventional two-sided Cramér-von Mises test for the equality of distributions. The statistic wasapproximately . and its p-value was . , implying that before the experiment, the distributions are indis-tinguishable. A and B in the previous sections). There are quarterlymeasures for income, earnings and transfers, but we concentrate only on measures of changein total income, comparing quarterly income before and after the households were randomlyassigned to one of the groups. Because assignment is random, we assume that the distributionfunctions of gains and losses under each policy, F JF and F AF DC , are point-identiﬁed by thediﬀerences in incomes before and after random assignment.

To make welfare decisions in terms of gains and losses, we require data in terms of changes,which we construct using several deﬁnitions. First, measurements were taken before randomassignment (RA) into one of the two programs, and we call these measurements pre-RAobservations. All periods after random assignment are labeled post-RA observations. Next,the Jobs First program stopped supporting individuals at what we call the Time Limit (TL),although quarterly income was observed for these households after the time limit. We call pre-TL observations those that were made after random assignment but before the time limit, whilepost-TL observations are those made after the JF time limit. We summarize the pre/post-RA and pre/post-TL observations in one of two ways — either by averaging income overall quarters in the relevant time span, or by using the ﬁnal quarter within the time span.Therefore there are four ways of deﬁning income changes based on all the combinations oftime limits and measurement summaries. 27hanges in household income due to the AFDC and JF policies were deﬁned using one oftwo methods. First, the natural log of of the average earnings in all post-RA quarters minusthe natural log of the average pre-RA quarterly earnings is called the average-RA change.Second, the natural log of the last quarter of post-RA income minus the natural log of thelast quarter of pre-RA income is called the last-quarter-RA change. Other changes are deﬁnedusing data around the Jobs First time limit. The natural log of average post-TL quarterlyearnings minus the natural log of average pre-TL quarterly earnings is called the average-TLchange. The natural log of the last quarter of post-TL income minus the natural log of thelast quarter of pre-TL income is called the last-quarter-TL change.We conducted formal tests of the hypothesis (10) using W n statistics (Cramér-von Misesstatistics applied to the empirical T process). The results of these tests are presentedin left hand side of Table 1. First, we consider the results when changes are deﬁned asacross the random assignment. The tests indicate that we cannot reject the hypothesis that F AF DC (cid:23)

LASD F JF unless we measure outcomes using average-RA changes. In that caseAFDC does not appear to dominate the JF policy. We also conducted tests of the hypothesis F JF (cid:23) LASD F AF DC . We cannot reject this null hypothesis using either measure. Because inone of these cases both distributions dominate each other, we double-checked using two-sidedtests of distributional equality, that is, for the null that F AF DC ≡ F JF . Using average incomemeasures the distributions appear to be diﬀerent, but using last quarter measures, we cannotreject the null that the distributions are indistinguishable. These tests oﬀer some evidencethat income changes across random assignment are indistinguishable or better under the JFpolicy than under the AFDC policy.Now we consider the case when changes are deﬁned as across the time limit (either usingaverages or last quarters). In this case, we do not reject the hypothesis that F AF DC (cid:23)

LASD F JF , and we reject the hypothesis that F JF (cid:23) LASD F AF DC . This is an indication that thecontinued support from the AFDC policy eﬀectively supports household incomes across theJF time limit better than the JF policy does — to be expected, since the JF policy providesno more support to any households after the time limit, allowing for a higher probability oflosses in household income.Figure 1 displays the CDFs of gains and losses under the AFDC and JF policies, then theway that the two T coordinate processes compare them — when looking at the coordinatesin equation (12), F A corresponds to F AF DC here, so large positive values correspond to arejection of the hypothesis F AF DC (cid:23)

LASD F JF . This ﬁgure uses only average-RA changeobservations. It can be seen in the second and third panels that the presumable reason that Results for the other test statistics are qualitatively the same. They are collected in an Online SupplementalAppendix. ASD in changes FOSD in levels F AF DC (cid:23) F JF F JF (cid:23) F AF DC equality G AF DC (cid:23) G JF G JF (cid:23) G AF DC equalityavg-RA . . . . p-value . . . . lastQ-RA . . . p-value . . . avg-TL . . . . . p-value . . . . . lastQ-TL . . p-value . . Table 1: This table presents a number of tests that can be used to infer whether the Jobs First(JF) program would be preferred to the Aid to Families with Dependent Children (AFDC) or theopposite. Column titles paraphrase the null hypotheses in the tests. The ﬁrst three columns usechanges in income and the last three columns measure income in levels without regard to pre-policy income. Comparisons made before and after assignment or time limit were measured usingthe average of all months or using the last quarter. 1999 bootstrap repetitions used in each test.the AFDC policy does not dominate the JF policy using LASD is because the probability ofsmall losses is higher in the AFDC program and the relation between small gains and smalllosses is preferable in JF. −5 0 5 . . . . . . CDFs (changes) change F n ( c hange ) . . . . m process x n m ^ ( x ) m process x n m ^ ( x ) Figure 1: The CDFs of changes in post-RA income and the way that they are turned into T ( F ) coordinate processes. The second and third panels correspond to plug-in estimates ofthe coordinate functions of equation (12). The large positive values in the second panel drivethe rejection of the hypothesis F AF DC (cid:23) F JF seen in Table 1.29 .2 Tests in levels: ﬁrst order stochastic dominance We also conducted an analysis of these data using standard FOSD inference methods. Testswere used to infer dominance of the AFDC or JF policies using post-randomization levels,that is, without regard to pre-randomization state. Income in levels is deﬁned in two ways.Post-RA average income is deﬁned as the natural log of the average income in all post-RAquarters. Post-TL average income is deﬁned as the natural log of the average income in onlythe post-TL quarters. We conduct tests of the null hypothesis that G AF DC (cid:23)

F OSD G JF or G JF (cid:23) F OSD G AF DC , where the notation G is meant as a reminder that these are marginalﬁnal income distributions that do not consider a household’s pre-policy income. The resultsof these tests are presented in right hand side of Table 1.Using all post-RA quarters, we can reject the hypothesis that G AF DC (cid:23)

F OSD G JF , butcannot reject the hypothesis that G JF (cid:23) F OSD G AF DC . Therefore it seems clear that theJF policy dominates the AFDC using ﬁnal outcome distributions, that is, without regard tothe eﬀect that the policies have on any particular household’s path from pre- to post-policyincome.When analyzing only the post-TL average income, we cannot reject the hypothesis that G AF DC (cid:23)

F OSD G JF or G JF (cid:23) F OSD G AF DC , although there is weak evidence that the secondrelation might be violated. We checked a two-sided test for distributional equality, and couldnot reject that the distributions were indistinguishable. Therefore marginal post-TL incomedistributions seem indistinguishable while data in changes reveals that households would preferthe AFDC program. The inferences made using data in levels and FOSD can therefore bequite diﬀerent from those using LASD with data on changes.The signiﬁcantly positive part that drives the rejection of the hypothesis G AF DC (cid:23)

F OSD G JF is represented by the spike in the right panel of Figure 2, which is due to the fact thatthe red AFDC CDF lies signiﬁcantly above the black JF CDF in the left plot near log incomelevel x = 8 . Public policies often result in gains for some individuals and losses for others. Evidence showsthat the way individuals value such gains and losses is a key determinant of public supportfor these policies. This in turn, can determine which policies decision makers pursue. Sinceloss aversion is a well established regularity, how can the welfare associated with alternative30 . . . . . . CDFs (levels) income G n ( i n c o m e ) FOSD process x n ( G n A F DC ( x ) - G n J F ) Figure 2: The CDFs of levels of post-RA income and the way that they are used to test ﬁrst-order stochastic dominance. The large positive values in the second panel drive the rejectionof the hypothesis G AF DC (cid:23) G JF seen in Table 1.policies be ranked when individuals are loss-averse?We address this question by deﬁning a social preference relation for distributions of gainsand losses caused by a policy: loss aversion-sensitive dominance (LASD). We show that thesesocial preferences are equivalent to criteria that depend solely on distribution functions. Theassumption of loss aversion can lead to a welfare ranking of policies that is diﬀerent from theone that would be brought about if classic utility theory and First-Order Stochastic Domi-nance were used. We then propose testable conditions for LASD. Because our data come asdiﬀerences between underlying random variables, we propose a point-identiﬁed version of theseconditions and also a partially identiﬁed analog.In order to make LASD comparisons using observed data, we propose statistical infer-ence methods to formally test LASD relations in both the point-identiﬁed and the partiallyidentiﬁed cases. We show that resampling techniques, tailored to speciﬁc features of the cri-terion functions, can be used to conduct inference. Finally, we illustrate our LASD criterionand inference methods with a simple empirical application that uses data from a well knownevaluation of a large income support policy in the US. This shows that the ranking of policyoptions depends crucially on whether changes or levels are used and whether or not one takesindividual loss aversion into account. 31 ppendixA Results on diﬀerentiability, uniform size control and compu-tation This section includes a deﬁnition and short discussion of the Hadamard directional diﬀerentia-bility concept and contains important intermediate results on Hadamard derivatives used toestablish the main results in the text. Next we present some results on the control of size overthe null region using the proposed resampling methods. Finally, there is one remark regardingthe computation of T and T processes ( T processes should probably be computed on a gridfor the sake of computation time). Proof of the results discussed in this appendix are collectedin Appendix B.4.The Hadamard derivative is a standard tool used to analyze the asymptotic behavior ofnonlinear maps in empirical process theory (van der Vaart, 1998, Section 20.2). We provide adeﬁnition here for completeness, along with its directional counterpart. Deﬁnition A.1 (Hadamard diﬀerentiability) . Let D and E be Banach spaces and consider amap φ : D φ ⊆ D → E .1. φ is Hadamard diﬀerentiable at f ∈ D φ tangentially to a set D ⊆ D if there is acontinuous linear map φ (cid:48) : D → E such that lim n →∞ (cid:13)(cid:13)(cid:13)(cid:13) φ ( f + t n h n ) − φ ( f ) t n − φ (cid:48) ( h ) (cid:13)(cid:13)(cid:13)(cid:13) E = 0 for all sequences { h n } ⊂ D and { t n } ⊂ R such that h n → h ∈ D and t n → as n → ∞ and f + t n h n ∈ D φ for all n .2. φ is Hadamard directionally diﬀerentiable at f ∈ D φ tangentially to a set D ⊆ D if thereis a continuous map φ (cid:48) f : D → E such that lim n →∞ (cid:13)(cid:13)(cid:13)(cid:13) φ ( f + t n h n ) − φ ( f ) t n − φ (cid:48) f ( h ) (cid:13)(cid:13)(cid:13)(cid:13) E = 0 for all sequences { h n } ⊂ D and { t n } ⊂ R + such that h n → h ∈ D and t n (cid:38) as n → ∞ and f + t n h n ∈ D φ for all n .In both cases of the above deﬁnition, φ (cid:48) f is continuous, with the addition of linearity in thefully-diﬀerentiable case (Shapiro, 1990, Proposition 3.1). They also diﬀer in the sequences ofadmissible { t n } , which allows the second deﬁnition to encode directions.32ecause the pair of marginal distribution functions always occur as the diﬀerence F A − F B ,the next few deﬁnitions and lemmas are stated for a single function f . For later results, mapswill be applied with the function f = F A − F B . The following maps will be used repeatedly inthis section and the proofs for analyzing more complex directionally diﬀerentiable maps. Let φ : R → R be φ ( x ) = ( x ) + = max { , x } , (43)and similarly, deﬁne ψ : R → R by ψ ( x, y ) = max { x, y } . (44)For some domain X ⊆ R j let σ : (cid:96) ∞ ( X ) → R be σ ( f ) = sup x ∈X f ( x ) . (45)These are all Hadamard directionally diﬀerentiable maps. It can be veriﬁed that for all a ∈ R , φ (cid:48) x ( a ) =  a x > { , a } x = 00 x < , (46)while for pairs ( a, b ) ∈ R , ψ (cid:48) x,y ( a, b ) =  a x > y max { a, b } x = yb x < y . For any (cid:15) ≥ , let M f ( (cid:15) ) = { x ∈ X : f ( x ) ≥ σ ( f ) − (cid:15) } be the set of (cid:15) -maximizers of f .Cárcamo, Cuevas, and Rodríguez (2019) show that for all directions h ∈ (cid:96) ∞ ( X ) σ (cid:48) f ( h ) = lim (cid:15) (cid:38) sup x ∈M f ( (cid:15) ) h ( x ) (47)and they also give conditions under which the limiting operation can be discarded and thesupremum of h can be taken over the set of maximizers of f .The next lemma shows shows that a weighted L p norm (for p > ) applied to the positivepart of a function is directionally diﬀerentiable. Cramér-von Mises statistics are found bysetting p = 2 . The directional diﬀerentiability of the L p norm with p = 1 was shown in LemmaS.4.5 of Fang and Santos (2019). Note that this lemma must be shown for the L p norm applied33o the positive-part map, jointly applied to a function f . This is because f (cid:55)→ ( f ) + is notdiﬀerentiable as a map of functions to functions. Nevertheless, the dominated convergencetheorem allows one to use pointwise convergence with integrability to ﬁnd the result. Lemma A.2.

Suppose f : X ⊆ R j → R k is a bounded and p -integrable function. Let w : X → R k + be such that ´ w i ( x ) d x < ∞ for i = 1 , . . . k . Let < p < ∞ and deﬁne the one-sided L p norm of f by λ ( f ) = (cid:32) k (cid:88) i =1 ˆ X (cid:0) ( f i ( x )) + (cid:1) p w i ( x ) d x (cid:33) /p . (48) For i = 1 , . . . k , deﬁne the subdomains X i − = { x ∈ X : f i ( x ) < } , X i = { x ∈ X : f i ( x ) = 0 } and X i + = { x ∈ X : f i ( x ) > } and the index collections I = { i ∈ , . . . k : µ ( X i ) > } and I + = { i ∈ , . . . k : µ ( X i + ) > } , where µ is Lebesgue measure. Then λ is Hadamarddirectionally diﬀerentiable and its derivative for any bounded, p -integrable h : X → R k is λ (cid:48) f ( h ) =  I + = I = ∅ (cid:16)(cid:80) i ∈I ´ X i (( h i ( x )) + ) p w i ( x ) d x (cid:17) /p I + = ∅ , I (cid:54) = ∅ λ ( f ) p − (cid:80) i ∈I + ´ X i + f p − i ( x ) h i ( x ) w i ( x ) d x I + (cid:54) = ∅ . (49)The above deﬁnitions make it easy, if rather abstract, to state the diﬀerentiability of themaps from distribution to test statistics that are applied to conduct uniform inference usingthe T process. Lemma A.3.

Let f ∈ (cid:96) ∞ ( X ) and let ν ( f ) = sup x ∈X (cid:0) ( f ( x )) + + f ( − x ) (cid:1) + (50) and, assuming f is square integrable, ω ( f ) = (cid:18) ˆ X { (( f ( x )) + + f ( − x )) + } d x (cid:19) / . (51) Then ν and ω are Hadamard directionally diﬀerentiable, and, letting f ( x ) = f ( − x ) and f ( x ) = f ( x ) + f ( − x ) , their derivatives for any direction h ∈ (cid:96) ∞ ( X ) are ν (cid:48) f ( h ) = (cid:16) φ (cid:48) ψ ( σ ( f ) ,σ ( f )) ◦ ψ (cid:48) σ ( f ) ,σ ( f ) (cid:17) ( σ (cid:48) f ( h ) , σ (cid:48) f ( h )) (52) and, assuming in addition that f, h are square integrable, ω (cid:48) f ( h ) = (cid:16) λ (cid:48) ψ ( f ,f ) ◦ ψ (cid:48) f ,f (cid:17) ( h, h ) , (53)34 here we take the order p = 2 and the weight function w ≡ in λ (cid:48) f deﬁned in (49) . Next we turn to results for the partially identiﬁed case. Lemma A.4 provides the theoret-ical tool needed for the analysis of Kolmogorov-Smirnov-type statistics when using Makarovbounds. First deﬁne the abstract map θ : ( (cid:96) ∞ ( U × X )) → R by θ ( f, g ) = sup x ∈X (cid:18) sup u ∈U f ( u, x ) + sup u ∈U g ( u, x ) (cid:19) . (54)For deﬁning the directional derivative of this map at some f and g , we need to consider (cid:15) -maximizers for any (cid:15) ≥ of these functions in u for each ﬁxed x , which for any f ∈ (cid:96) ∞ ( U × X ) is the set-valued map M f ( x, (cid:15) ) = (cid:26) u ∈ U : f ( u, x ) ≥ sup u ∈U f ( u, x ) − (cid:15) (cid:27) . (55)We reserve one special label for the collection of (cid:15) -maximizers of the outer maximizationproblem that deﬁnes θ : for any (cid:15) ≥ let M θ ( (cid:15) ) = { ( u, x ) ∈ U × X : f ( u, x ) + g ( u, x ) ≥ θ ( f, g ) − (cid:15) } . (56)Lemma A.4 ahead discusses derivatives of θ , a functional that imposes two levels of max-imization with an intermediate addition step, and shows that this operator is directionallydiﬀerentiable. It is similar to the case of maximizing a bounded bivariate function, and itsproof follows that of Theorem 2.1 of Cárcamo, Cuevas, and Rodríguez (2019), which dealtwith directional diﬀerentiability of the supremum functional applied to a bounded function.The statement is for the sum of only two functions as arguments but it is straightforward toextend to any ﬁnite number of functions, as in Theorem 4.3. Lemma A.4.

Let

U ⊆ R m and X ⊆ R n . Suppose that f, g ∈ (cid:96) ∞ ( U × X ) , and let θ be the mapdeﬁned in (54) . Then θ is Hadamard directionally diﬀerentiable and its derivative at ( f, g ) forany directions ( h, k ) ∈ ( (cid:96) ∞ ( U × X )) is θ (cid:48) f,g ( h, k ) = lim (cid:15) (cid:38) sup x ∈M θ ( (cid:15) ) (cid:32) sup u ∈M f ( x,(cid:15) ) h ( u, x ) + sup u ∈M g ( x,(cid:15) ) k ( u, x ) (cid:33) . (57)The behavior of bootstrap tests under the null and alternatives is most easily examinedusing distributions local to P . We consider sequences of distributions P n local to the nulldistribution P such that for a mean-zero, square-integrable function η , P n have distribution35unctions F n (where P has CDF F ) that satisfy lim n →∞ ˆ (cid:18) √ n (cid:16)(cid:112) d F n − √ d F (cid:17) − η √ d F (cid:19) → . (58)The behavior of the underlying empirical process under local alternatives satisﬁes Assumption5 of Fang and Santos (2019) in a straightforward way (Wellner, 1992, Theorem 1). Theorem A.5.

Make assumptions A1 - A2 and suppose that F A (cid:23) LASD F B . Suppose that X is convex. Let ˆ q V ∗ j (1 − α ) and ˆ q W ∗ j (1 − α ) be the (1 − α ) th sample quantile from the bootstrapdistributions as described in the routines above. Then for j = 1 , ,1. When P ∈ P and { P n } satisfy (58) and T j ( F n )( x ) ≤ for all x ≥ , lim sup n →∞ P n (cid:110) V jn > ˆ q V ∗ j (1 − α ) (cid:111) ≤ α and lim sup n →∞ P n (cid:110) W jn > ˆ q W ∗ j (1 − α ) (cid:111) ≤ α.

2. When P ∈ P and { P n } satisfy (58) and T j ( F n )( x ) ≤ for all x ≥ , and the distribu-tion of V or W is increasing at its (1 − α ) th quantile, lim n →∞ P n (cid:110) V jn > ˆ q V ∗ j (1 − α ) (cid:111) = α and lim n →∞ P n (cid:110) W jn > ˆ q W ∗ j (1 − α ) (cid:111) = α. Now we consider using the resampling routine outlined above to test the null hypothesisthat F A (cid:23) LASD F B when the distributions are only partially identiﬁed. It is no longer possibleto guarantee exact rejection probabilities because the test is based on a superset of P , butwe can still show that the test does not overreject. Theorem A.6.

Make assumptions B1 - B2 . Also assume that X is a convex set. Let ˆ q V ∗ (1 − α ) and ˆ q W ∗ (1 − α ) be the (1 − α ) th sample quantile from the bootstrap distributions of { V ∗ r n } Rr =1 or { W ∗ r n } Rr =1 as described in the routine above. When the sequence of alternative distributions P n satisfy (58) and T ( F n )( x ) ≤ for all x ≥ , lim sup n →∞ P n (cid:8) V n > ˆ q V ∗ (1 − α ) (cid:9) ≤ α and lim sup n →∞ P n (cid:8) W n > ˆ q W ∗ (1 − α ) (cid:9) ≤ α. emark A.7 (A note on computing point-identiﬁed criterion functions) . Standard empiricaldistribution functions are used to estimate the marginal distributions F A and F B . However,the deﬁnitions of the T and T criterion functions contain F k ( − x ) terms, making the plug-in T j ( F n ) left-continuous at some sample observations. Therefore some care must be takenwhen evaluating them because there may be regions that are relevant for evaluation (i.e., thelocation of the supremum) that are not attained by any sample observations. This couldbe dealt with approximately by evaluating the functions on a grid. Instead, we evaluate thefunction approximately at all the points where it changes its value. For example, let X n denotethe pooled sample (of size ( n A + n B ) ) of X A and X B observations. Then we evaluate T j at thepoints ˜ X n = 0 ∪ X + n ∪ { X n − (cid:15) } − , where X + n and X − n refer to the positive- and negative-valuedelements of the pooled sample X n and (cid:15) is a very small amount added to each element of X n ,for example, the square root of the machine’s double-precision accuracy. When evaluating the L integrals from an observed sample, the domain can be set to [0 , ˜ x max ] , where ˜ x max is thelargest point in the evaluation set ˜ X n , because the integrand is identically zero above thatpoint. B Proof of results

B.1 Results in Section 2

Proof of Proposition 2.3.

Equation (1) implies that W ( F ) = ˆ R − v ( x ) d F ( x ) + ˆ R + v ( x ) d F ( x ) . (59)For the ﬁrst part of (59) note that ˆ R − v ( x ) d F ( x ) = lim R →−∞ ˆ R v ( x ) d F ( x )= lim R →−∞ (cid:20) v ( x ) F ( x ) | R − ˆ R v (cid:48) ( x ) F ( x ) d x (cid:21) = − ˆ −∞ v (cid:48) ( x ) F ( x ) d x, using the normalization v (0) = 0 noted in Deﬁnition 2.2, the assumed bounded support of F and integration by parts. 37imilarly, ˆ R + v ( x ) d F ( x ) = − ˆ R + v ( x ) d (1 − F )( x )= − lim R →∞ ˆ R v ( x ) d (1 − F )( x )= − lim R →∞ (cid:20) v ( x )(1 − F ( x )) | R − ˆ R v (cid:48) ( x )(1 − F ( x )) d x (cid:21) = ˆ ∞ v (cid:48) ( x )(1 − F ( x )) d x. Putting these two parts together yields (2).

B.2 Proofs of results in Section 3

Proof of Theorem 3.1.

Notice that (3) is equivalent to both (4) and (5); in this proof we usethe latter two conditions. Using Proposition 2.3 we rewrite W ( F A ) ≥ W ( F B ) as the equivalentcondition − ˆ −∞ v (cid:48) ( z ) F A ( z ) d z + ˆ ∞ v (cid:48) ( z )(1 − F A ( z )) d z ≥ − ˆ −∞ v (cid:48) ( z ) F B ( z ) d z + ˆ ∞ v (cid:48) ( z )(1 − F B ( z )) d z. Rearranging terms we ﬁnd this is equivalent to ˆ −∞ v (cid:48) ( z ) F B ( z ) d z − ˆ −∞ v (cid:48) ( z ) F A ( z ) d z ≥ ˆ ∞ v (cid:48) ( z )(1 − F B ( z )) d z − ˆ ∞ v (cid:48) ( z )(1 − F A ( z )) d z or simply ˆ −∞ v (cid:48) ( z )( F B ( z ) − F A ( z )) d z ≥ ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z )) d z. This is in turn equivalent to ˆ ∞ v (cid:48) ( − z )( F B ( − z ) − F A ( − z )) d z ≥ ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z ))) d z or − ˆ ∞ v (cid:48) ( − z )( F A ( − z ) − F B ( − z )) d z ≥ ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z ))) d z. Adding v (cid:48) ( z )( F A ( − z ) − F B ( − z )) to both sides we ﬁnd this is equivalent to ˆ ∞ ( v (cid:48) ( z ) − v (cid:48) ( − z ))( F A ( − z ) − F B ( − z )) d z ≥ ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z ) + F A ( − z ) − F B ( − z )) d z. (60)38tilizing the assumptions of loss aversion and non-decreasingness given in Deﬁnition 2.2, (4)and (5) are suﬃcient for (60) to hold for any v . Condition (5) is due to the fact that F A ( x ) − F B ( x ) + F A ( − x ) − F B ( − x ) ≤ ∀ x ≥ is equivalent to the condition − F A ( x ) − F A ( − x ) ≥ − F B ( x ) − F B ( − x ) ∀ x ≥ . We now show that conditions (4) and (5) are also necessary by means of a contradiction to(60). To this end, assume that there exists some x such that F A ( − x ) − F B ( − x ) > . From thefact that the distribution function is right continuous, it follows that there is a neighbourhood ( a, b ) , b > a ≥ , such that for all x ∈ ( a, b ) , F A ( − x ) − F B ( − x ) > . Consider the valuefunction v ( x ) =  a − b x ≤ − b x ≥ − ax + a x ∈ ( − b, − a ) . Note that this v satisﬁes conditions 1-3 of Deﬁnition 2.2. Further, for x ∈ ( a, b ) , v (cid:48) ( − x ) = 1 >v (cid:48) ( x ) = 0 . Therefore ˆ ∞ ( v (cid:48) ( z ) − v (cid:48) ( − z ))( F A ( − z ) − F B ( − z )) d z < , while ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z ) + F A ( − z ) − F B ( − z )) d z = 0 , because v (cid:48) ( x ) = 0 for x ≥ . This contradicts (60).The second condition can be proven similarly. Assume that there exists a neighbourhood ( a, b ) , ≤ a < b such that for all x ∈ ( a, b ) , (1 − F A ( x )) − F A ( − x ) < (1 − F B ( x )) − F B ( − x ) .Take v non-decreasing and such that v (cid:48) ( x ) = v (cid:48) ( − x ) , for example v ( x ) = sgn ( x ) ×  | x | ∈ [0 , a ]( | x | − a ) | x | ∈ ( a, b ) b − a | x | ∈ [ b, ∞ ) . v we ﬁnd ˆ ∞ ( v (cid:48) ( z ) − v (cid:48) ( − z ))( F A ( − z ) − F B ( − z )) d z = 0 while ˆ ∞ v (cid:48) ( z )( F A ( z ) − F B ( z ) + F A ( − z ) − F B ( − z )) d z > , which is a contradiction. Proof of Corollary 3.2.

Use v ( x ) = x , which belongs to the class of functions used in the ﬁrstpart of Theorem 3.1 and in Deﬁnition 2.4. Proof of Corollary 3.5.

We ﬁrst notice that F A (cid:23) F OSD F SQ is equivalent to the event { F A is supported on R + } . (61)Property (61) easily implies that F A (cid:23) LASD F SQ , which follows by Property 1 of Deﬁnition 2.2.On the other hand one checks that v ( x ) :=  x x ≤ x > fulﬁlls Deﬁnition 2.2. Thus F A (cid:23) LASD F SQ implies (61). Proof of Remark 3.6.

The social value function in this case is the following ˆ R × [0 , ∞ ) v ( x ) + v ( y ) d F ( x, y ) = ˆ ∞ ˆ −∞ ( v ( x ) + v ( y )) f ( x, y ) d x d y + ˆ ∞ ˆ ∞ ( v ( x ) + v ( y )) f ( x, y ) d x d y. Let us deﬁne ˜ f Y ( x, y ) = ´ x −∞ f ( z, y ) d z and let F X , F Y denote marginals of, respectively, X, Y .Integrating by parts on the negative domain of x we get ˆ ∞ (cid:20) ( v ( x ) + v ( y )) ˜ f Y ( x, y ) | −∞ − ˆ −∞ v (cid:48) ( x ) ˜ f Y ( x, y ) d x (cid:21) d y v ( x ) = 0 for x = 0 and that ˜ f Y ( x, y ) = 0 for x = −∞ we get (cid:20) ˆ ∞ v ( y ) ˜ f Y (0 , y ) d y (cid:21) − ˆ −∞ (cid:20) ˆ ∞ v (cid:48) ( x ) ˜ f Y ( x, y ) d y (cid:21) d x Performing integration by parts again and noticing that v (cid:48) ( x ) is independent of integrationarea in the second expression, we obtain (cid:20) v ( y ) F (0 , y ) | ∞ − ˆ ∞ v (cid:48) ( y ) F (0 , y ) d y (cid:21) − ˆ −∞ (cid:104) v (cid:48) ( x ) (cid:16) F ( x, y ) | ∞ (cid:17)(cid:105) d x (cid:20) v ( ∞ ) F X (0) − ˆ ∞ v (cid:48) ( y ) F (0 , y ) d y (cid:21) − ˆ −∞ v (cid:48) ( x ) F X ( x ) d x. In the end, we obtain (cid:2) v ( ∞ ) − v ( ∞ ) F X (0) − v ( ∞ ) F Y (0) (cid:3) − (cid:2) ´ ∞ v (cid:48) ( y ) F Y ( y ) − v (cid:48) ( y ) F (0 , y ) d y (cid:3) − ´ ∞ (cid:2) v (cid:48) ( x )( F X ( x ) − F ( x, (cid:3) d x. We will now turn to the positive domain of x , thus ˆ ∞ (cid:20) ( v ( x ) + v ( y )) ˜ f Y ( x, y ) | ∞ − ˆ ∞ v (cid:48) ( x ) ˜ f Y ( x, y ) d x (cid:21) d y and (cid:20) ˆ ∞ ( v ( ∞ ) + v ( y )) f Y ( y ) − v ( y ) ˜ f Y (0 , y ) d y (cid:21) − ˆ ∞ (cid:20) ˆ ∞ v (cid:48) ( x ) ˜ f Y ( x, y ) d y (cid:21) d x. Finally, (cid:2) ( v ( ∞ ) + v ( y )) F Y ( y ) − v ( y ) F (0 , y ) (cid:3) | ∞ − (cid:2) ´ ∞ v (cid:48) ( y ) F Y ( y ) − v (cid:48) ( y ) F (0 , y ) d y (cid:3) − ´ ∞ (cid:104) v (cid:48) ( x ) (cid:16) F ( x, y ) | ∞ (cid:17)(cid:105) d x. Putting together the negative and the positive side, we obtain (cid:2) v ( ∞ ) F X (0) − ´ ∞ v (cid:48) ( y ) F (0 , y ) d y (cid:3) − ´ −∞ v (cid:48) ( x ) F X ( x ) d x + (cid:2) v ( ∞ ) − v ( ∞ ) F X (0) − v ( ∞ ) F Y (0) (cid:3) − (cid:2) ´ ∞ v (cid:48) ( y ) F Y ( y ) − v (cid:48) ( y ) F (0 , y ) d y (cid:3) − ´ ∞ (cid:2) v (cid:48) ( x )( F X ( x ) − F ( x, (cid:3) d x. After simplifying this expression becomes − ´ −∞ v (cid:48) ( x ) F X ( x ) d x + (cid:2) v ( ∞ ) − v ( ∞ ) F Y (0) (cid:3) − (cid:2) ´ ∞ v (cid:48) ( y ) F Y ( y ) d y (cid:3) − ´ ∞ (cid:2) v (cid:48) ( x )( F X ( x ) − F ( x, (cid:3) d x. Using the fact that y ∈ [0 , ∞ ] we get v ( ∞ ) − ˆ −∞ v (cid:48) ( x ) F X ( x ) d x − ˆ ∞ v (cid:48) ( y ) F Y ( y ) d y − ˆ ∞ v (cid:48) ( x ) F X ( x ) d x and v (0) + 2 ˆ ∞ v (cid:48) ( x ) d x − ˆ −∞ v (cid:48) ( x ) F X ( x ) d x − ˆ ∞ v (cid:48) ( y ) F Y ( y ) d y − ˆ ∞ v (cid:48) ( x ) F X ( x ) d x. v (0) = 0 we have − ˆ −∞ v (cid:48) ( x ) F X ( x ) d x + ˆ ∞ v (cid:48) ( y )(1 − F Y ( y )) d y + ˆ ∞ v (cid:48) ( x )(1 − F X ( x )) d x. The only change in comparison to Theorem 3.1 is the addition of the term ´ ∞ v (cid:48) ( y )(1 − F Y ( y )) d y , which looks quite natural knowing that not only gains and losses but also incomesare considered. Applying the ﬁrst part of the proof of Theorem 3.1 the comparison betweendistributions F A and F B comes down to the following inequality ˆ ∞ ( v (cid:48) ( x ) − v (cid:48) ( − x ))( F XA ( − x ) − F XB ( − x )) d x + ˆ ∞ ( v (cid:48) ( y ) (cid:16) F YB ( y ) − F YA ( y ) (cid:17) d y ≥ ˆ ∞ v (cid:48) ( x )( F XA ( x ) − F XB ( x ) + F XA ( − x ) − F XB ( − x )) d x. In comparison to (60) this inequality includes additionally the comparison of

A, B forincomes y . Since v (cid:48) ( y ) ≥ (i.e. utility is increasing with income), assuming (4), (5) andadditionally that F YB ( y ) − F YA ( y ) ≥ for all y , that is, A dominates B for incomes accordingto FOSD, is enough to ensure that A is better than B . Proof of Theorem 3.7.

Given the bounds inequality, we have L B ( − x ) − U A ( − x ) ≤ F B ( − x ) − F A ( − x ) ≤ U B ( − x ) − L A ( − x ) and L A ( x ) − U B ( x ) ≤ F A ( x ) − F B ( x ) ≤ U A ( x ) − L B ( x ) , from which it is clear that (8) is a suﬃcient condition. As a necessary condition we have (9),as otherwise we would have F B ( − x ) − F A ( − x ) ≤ U B ( − x ) − L A ( − x ) ≤ L A ( x ) − U B ( x ) ≤ F A ( x ) − F B ( x ) . Proof of Corollary 3.8.

Recall Corollary 3.5 implied that when F B is a status quo distribution,the FOSD and LASD relations are equivalent. Then F A (cid:23) F OSD F SQ implies that F A ( − x ) =0 for all x ≥ because F SQ ( − x ) = 0 for all x ≥ . Therefore a suﬃcient condition for42 A (cid:23) LASD F SQ is that U A ( − x ) = 0 for all x ≥ . Similarly, if F A (cid:23) LASD F SQ , equivalentto F A (cid:23) F OSD F SQ , then it must be the case that F A ( − x ) = 0 for all x ≥ , implying that L A ( − x ) = 0 as well. B.3 Results in Section 4

Proof of Theorem 4.1.

For Part 1 note that if

P ∈ P then by deﬁnition, X k ( P ) (cid:54) = ∅ forsome k ∈ { , } and for all x ∈ X k ( P ) , m k ( x ) = 0 . Then the supremum is achieved and lim (cid:15) (cid:38) M k ( (cid:15) ) = X k ( P ) for at least one coordinate, so that suprema are taken over at leastone of X ( P ) and X ( P ) and whichever coordinate satisﬁes this condition will contribute to theasymptotic distribution. Note that for all x ∈ X ( P ) , √ nT ( F n )( x ) = √ n ( T ( F n ) − T ( F ))( x ) .Lemma A.3 and the null hypothesis, which implies X k ( P ) (cid:54) = ∅ for k ∈ { , } , imply the resultfor V and W .To show Part 2, note that T is a linear map of F , and assuming that X k ( P ) (cid:54) = ∅ for k ∈{ , } , we have that its weak limit (for whichever set is nonempty) is sup x ∈X k ( P ) ( T k ( G F )( x )) + by Lemma A.3. Breaking X ( P ) into its two subsets and assuming the null hypothesis is trueresults in the same behavior as the supremum norm statistic from the ﬁrst part (using thedeﬁnition of the supremum norm in two coordinates as the maximum of the two suprema).The same reasoning holds for the L statistic in Part 2.Part 3 follows from the behavior of the test statistics over { x ∈ X : m ( x ) < , m ( x ) < } described in Lemma A.3. To show Part 4 for V n suppose that for some x ∗ , T ( F )( x ∗ ) = ξ > .Then sup x ∈X √ nT ( F n )( x ) ≥ √ n ( T ( F n )( x ∗ ) − T ( F )( x ∗ )) + √ nξ . Then lim inf n →∞ P (cid:26) sup x ≥ √ nT ( F n )( x ) > c (cid:27) ≥ lim n →∞ P (cid:8) √ n ( T ( F n )( x ∗ ) − T ( F )( x ∗ )) > c − √ nξ (cid:9) → , where the last convergence follows from the delta method applied to √ n ( F n ( x ∗ ) − F ( x ∗ )) ,which converges in distribution to a tight random variable. The proof for the other statisticsis analogous. Proof of Theorem 4.2.

This theorem is an application of Theorem 3.2 of Fang and Santos(2019). Deﬁne the statistics V and W as maps from F to the real line using ν and ω deﬁnedin equations (50) and (51) in Lemma A.3, and let their estimators be deﬁned as in part 3 ofthe resampling scheme. Their Assumptions 1-3 are satisﬁed either by the deﬁnitions of ν and ω and Lemma A.3, the standard convergence result √ n ( F n − F ) (cid:59) G F (van der Vaart and43ellner, 1996, Theorem 2.8.4) and the choice of bootstrap weights. We need to show thattheir Assumption 4 is also satisﬁed. Write either function as (cid:107) h +1 (cid:107) + (cid:107) h +1 ∨ h +2 (cid:107) + (cid:107) h +2 (cid:107) usingthe desired norm. Both norms satisfy a reverse triangle inequality, and using the fact that | ( x ) + − ( y ) + | ≤ | x − y | , the diﬀerence for two functions g and h is bounded by (cid:107) g − h (cid:107) + (cid:107) g ∨ g − h ∨ h (cid:107) + (cid:107) g − h (cid:107) . The ﬁrst diﬀerence is bounded by (cid:107) g − h (cid:107) , and the secondand the third are bounded by (cid:107) g − h (cid:107) . Rewriting equations (31) and (32) as functionals ofdiﬀerential directions h , deﬁne ˆ ν (cid:48) n ( h ) = (cid:16) max x ∈ ˆ M ˆ k ( b n ) h ˆ k ( x ) (cid:17) + | max ˆ m n − max ˆ m n | > c n max (cid:110) , max x ∈ ˆ M ( b n ) h ( x ) , max x ∈ ˆ M ( b n ) h ( x ) (cid:111) | max ˆ m n − max ˆ m n | ≤ c n and ˆ ω (cid:48) n ( h ) = (cid:32) ˆ ˆ X (cid:0) ( h ( x )) + (cid:1) d x + ˆ ˆ X (cid:0) ( h ( x )) + (cid:1) d x (cid:33) / . (62)Because both ν and ω are Lipschitz, Lemma S.3.6 of Fang and Santos (2019) implieswe need only check that | ˆ ν (cid:48) n ( h ) − ν (cid:48) F ( h ) | = o P (1) and | ˆ ω (cid:48) n ( h ) − ω (cid:48) F ( h ) | = o P (1) for eachﬁxed h . This follows from the consistency of the contact set and (cid:15) -argmax estimators.The consistency of these estimators follow from the uniform law of large numbers for the (cid:15) -maximizing sets, and the tightness of the limit G F for the contact sets, which implies that lim n P {√ n (cid:107) F n − F (cid:107) ∞ ≤ √ na n } = 1 . Proof of Theorem 4.3.

Consider V ﬁrst. Note that V n can be rewritten as V n = √ n sup( T ( G n )) + = √ n max { , sup T ( G n ) } . Lemma A.4, extended to the four parts of the T process, and the condition that X nec ( P ) (cid:54) = ∅ ,implies each of the four inner results. The derivative of the positive-part map discussed in (46),with the hypothesis that P ∈ P nec , which implies lim (cid:15) (cid:38) M nec ( (cid:15) ) = X nec , and the chain ruleimply the outer part of the derivative and Theorem 2.1 of Fang and Santos (2019) implies theresult. For W n and W , the ﬁnite-sample integrand converges pointwise for each x ∈ X to thelimit. By assumption there are no x such that the integrand is positive, which leaves the x in X nec ( P ) as the nontrivial part of the integral. Because the limit is assumed square-integrable,dominated convergence, Lemma A.2 and Theorem 2.1 of Fang and Santos (2019) imply theresult.For Part 2, note that by hypothesis X nec ( P ) = ∅ and there are no x that result in T ( G )( x ) > . Therefore Theorem 2.1 of Fang and Santos (2019), along with the chain rule,44emmas A.4 and A.2 and the positive-part map, imply the result. The proof of Part 3 is thesame as the analogous part of the proof of Theorem 4.1. Proof of Theorem 4.4.

For both statistics, Assumptions 1-3 of Fang and Santos (2019) aretrivially satisﬁed (van der Vaart and Wellner, 1996, Theorem 2.8.4) or satisﬁed by constructionin the case of the bootstrap weights. Below we check that their Assumption 4 is also satisﬁedfor both statistics, so that the statement of the theorem follows from their Theorem 3.2.Consider V n ﬁrst, and write the supremum statistic as a function of underlying processesabstractly labeled g : the limiting variable relies (through the delta method) on a map of theform V = V ( g ) = ( φ (cid:48) θ ( g ) ◦ θ (cid:48) g )( h ) , where g ∈ ( (cid:96) ∞ ( R × X )) , φ (cid:48) x is deﬁned in (46) and θ (cid:48) g in (57)(extended to four functions as the arguments of the map). V n uses the sample estimates ofthese functions. Under the null hypothesis θ ( g ) = 0 , so that we may estimate ˆ φ (cid:48) n ( x ) = ( x ) + ,which is Lipschitz because | ( x ) + − ( y ) + | ≤ | x − y | . Writing the formula for the estimate ofthe derivative of θ for just two functions f and g (since the estimator for four functions canbe extended immediately from this case), we have, given sequences { b n } and { d n } , ˆ θ (cid:48) ( h, k ) = max x ∈ ˆ M θ (cid:32) max u ∈ ˆ M f ( x ) h ( u, x ) + max u ∈ ˆ M g ( x ) k ( u, x ) (cid:33) . This map is Lipschitz in ( h, k ) : given any ( f, g ) pair, paraphrasing the sets over which maximaare taken and their arguments, we have (cid:12)(cid:12)(cid:12) ˆ θ (cid:48) ( h , k ) − ˆ θ (cid:48) ( h , k ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ˆ M θ (cid:32) max ˆ M f h + max ˆ M g k (cid:33) − max ˆ B (cid:32) max ˆ M f h + max ˆ M g k (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ˆ M θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ˆ M f h + max ˆ M g k − max ˆ M f h − max ˆ M g k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ˆ M θ max ˆ M f | h − h | + max ˆ M θ max ˆ M g | k − k |≤ {(cid:107) h − h (cid:107) ∞ , (cid:107) k − k (cid:107) ∞ } = 2 (cid:107) ( h , k ) − ( h , k ) (cid:107) ∞ . Because all the maps in the chain that deﬁnes V n are Lipschitz, V n is itself Lipschitz, andtherefore Lemma S.3.6 of Fang and Santos (2019) implies that their Assumption 4 holds if (cid:107) ( ˆ φ (cid:48) θ ( g ) ◦ ˆ θ (cid:48) g )( h ) − ( φ (cid:48) θ ( g ) ◦ θ (cid:48) g )( h ) (cid:107) = o P (1) (where the arguments g and h are again elements of ( (cid:96) ∞ ( R × X )) ). This follows from the consistency of the (cid:15) -maximizer estimates.Next consider W n . For this part simplify the inner part to the sum of two functions, f and g , since the result is a simple generalization. Write W n = W n ( h, k ) = (ˆ λ (cid:48) µ ( f,g ) ◦ ˆ µ (cid:48) f,g )( h, k ) ,45here the marginal (in u ) maximization map µ is deﬁned for each x ≥ , by µ ( f, g )( x ) =sup U f ( u, x )+sup U g ( u, x ) and for each x ≥ , ˆ µ (cid:48) f,g ( h, k )( x ) = max u ∈ ˆ M f ( x ) h ( u, x )+max u ∈ ˆ M g ( x ) k ( u, x ) (deﬁne ˆ M f ( x ) and ˆ M g ( x ) as in (41)). First, (cid:13)(cid:13) ˆ µ (cid:48) ( h , k ) − ˆ µ (cid:48) ( h , k ) (cid:13)(cid:13) ∞ = sup X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ˆ M f ( x ) h + max ˆ M g ( x ) k − max ˆ M f ( x ) h − max ˆ M g ( x ) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) h − h (cid:107) ∞ + (cid:107) k − k (cid:107) ∞ ≤ (cid:107) ( h , k ) − ( h , k ) (cid:107) ∞ . Second, for square integrable f and h consider the estimate, assuming P ∈ P nec , ˆ λ (cid:48) ( h ) = λ ( h | ˆ X ) where f | A denotes the restriction of the function f to the set A . On ˆ X the subadditivity ofthe norm trivially implies that ˆ λ (cid:48) is Lipschitz there. This implies that ˆ λ (cid:48) is a Lipschitz map,and in turn that ˆ λ (cid:48) µ ( f,g ) ◦ ˆ µ (cid:48) f,g is Lipschitz.Finally, ˆ µ (cid:48) f,g ( h, k )( x ) converges for each x the pointwise limit µ (cid:48) f,g ( h, k )( x ) = lim (cid:15) (cid:38) (cid:32) sup u ∈ ˆ M f ( x,(cid:15) ) h ( u, x ) + max u ∈ ˆ M g ( x,(cid:15) ) k ( u, x ) (cid:33) . The set estimators ˆ X and ˆ X + are consistent estimators for X and X + using the same argu-ment as above for the supremum norm. Then for square integrable h and k , the dominatedconvergence theorem implies that for any given f, g , (cid:12)(cid:12)(cid:12) (ˆ λ (cid:48) µ ( f,g ) ◦ ˆ µ (cid:48) f,g )( h, k ) − ( λ (cid:48) µ ( f,g ) ◦ µ (cid:48) f,g )( h, k ) (cid:12)(cid:12)(cid:12) = o P (1) , and Lemma S.3.6 of Fang and Santos (2019) implies the result. B.4 Results in Appendix A

Proof of Lemma A.2.

Let { t n } be a sequence of positive numbers such that t n (cid:38) as n → ∞ ,and let { h n } ∈ ( (cid:96) ∞ ( X )) k be a sequence of bounded, p -integrable functions such that h n → h ∈ ( (cid:96) ∞ ( X )) k as n → ∞ .Suppose that for all i and all x ∈ X , f i ( x ) < , or in other words, I + = I = ∅ . For anypoint x there exists some N such that for all n > N , ( f i + t n h ni ) + = 0 because t n (cid:38) and h i

46s bounded. Then dominated convergence implies that the p -th power of the L p norm satisﬁes lim n →∞ t n (cid:32) k (cid:88) i =1 ˆ X i − (cid:0) ( f i ( x ) + t n h ni ( x )) + (cid:1) p w i ( x ) d x − k (cid:88) i =1 ˆ X i − (cid:0) ( f i ( x )) + (cid:1) p w i ( x ) d x (cid:33) = 0 . This is also the result for λ ( f ) in this case, which is the diﬀerence of these terms each raisedto the power /p .Next suppose I (cid:54) = ∅ and I + = ∅ , that is, for some i , {X i } has positive measure but themeasure of x that make any coordinate of f positive is zero. Then calculate the diﬀerencesdirectly: lim n →∞ t n (cid:40)(cid:32) k (cid:88) i =1 ˆ X i (cid:0) ( f i ( x ) + t n h ni ( x )) + (cid:1) p w i ( x ) d x (cid:33) /p − (cid:32) k (cid:88) i =1 ˆ X i (cid:0) ( f i ( x )) + (cid:1) p w i ( x ) d x (cid:33) /p (cid:41) = lim n →∞ t n (cid:32) t pn k (cid:88) i =1 ˆ X i (cid:0) ( h ni ( x )) + (cid:1) p w i ( x ) d x (cid:33) /p = k (cid:88) i =1 ˆ X i (cid:0) ( h i ( x )) + (cid:1) p w i ( x ) d x using dominated convergence and the p -integrability of h . If the subregions { x : f i ( x ) < } have positive measure, they contribute 0 to the limit.Now suppose that I + is not empty, that is, there is at least one i such that X i + has positivemeasure. Then for each x ∈ X + i there exists an N such that for n > N , f i ( x ) + t n h ni ( x ) > for all i . Then for n > N , for this x , ( f i ( x ) + t n h ni ( x )) p − f pi ( x ) = p (cid:88) j =0 (cid:18) pj (cid:19) f ji ( x )( t n h ni ( x )) p − j − f pi ( x )= f pi ( x ) + pt n f p − i ( x ) h ni ( x ) + O ( t n ) − f pi ( x )= pt n f p − i ( x ) h ni ( x ) + O ( t n ) . This implies that for n large enough, the inner integral, using the calculations from the previous47arts to account for the sets where f i is zero or negative, satisﬁes lim n →∞ t n (cid:40) k (cid:88) i =1 ˆ X ( f i ( x ) + t n h ni ( x )) p w i ( x ) d x − k (cid:88) i =1 ˆ X f pi ( x ) w i ( x ) d x (cid:41) = lim n →∞ t n (cid:40) pt n k (cid:88) i =1 ˆ X i + f p − i ( x ) h ni ( x ) w i ( x ) d x + O ( t n ) + O ( t pn ) + 0 (cid:41) = p k (cid:88) i =1 ˆ X i + f p − i ( x ) h i ( x ) w i ( x ) d x. Using the expansion ( x + th t ) /p = x /p + p x (1 − p ) /p th t + o ( | th t | ) as t → , it can be seen thatthe Hadamard derivative of x (cid:55)→ x /p is p x (1 − p ) /p h . Therefore the chain rule and integrabilityof f and h implies that the derivative is λ ( f ) p − k (cid:88) i =1 ˆ X i + f p − i ( x ) h i ( x ) w i ( x ) d x. Proof of Lemma A.3.

For ν write ν ( f ) = sup x ∈X (cid:0) ( f ( x )) + + f ( − x ) (cid:1) + = sup x ∈X max (cid:8) , ( f ( x )) + + f ( − x ) (cid:9) = sup x ∈X max { , max { f ( − x ) , f ( x ) + f ( − x ) }} and using the deﬁnitions of f and f made in the statement of the lemma and changing theorder in which the maxima are computed = max (cid:26) , max (cid:26) sup x ∈X f ( x ) , sup x ∈X f ( x ) (cid:27)(cid:27) = ( φ ◦ ψ )( σ ( f ) , σ ( f )) . Then using the chain rule (Shapiro, 1990) the derivative is that given in the statement of thelemma. For ω , assume f and h are square integrable and write ω ( f ) = λ (( f ( x )) + + f ( − x ))= λ (max { f ( − x ) , f ( x ) + f ( − x ) } )= ( λ ◦ ψ )( f , f ) . Proof of Theorem A.5.

This is an application of Corollary 3.2 in Fang and Santos (2019), andwe only sketch the most important details of the proof. After applying the null hypothesis, thederivatives ν (cid:48) F and ω (cid:48) F shown in (52) and (53) are both convex. For example, in the expressionfor ν (cid:48) F , (cid:32) sup X ( P ) ( αh A + (1 − α ) h B ) (cid:33) + ≤ α (cid:32) sup X ( P ) h A (cid:33) + + (1 − α ) (cid:32) sup X ( P ) h B (cid:33) + and similar calculations hold for the other two terms. In the case of ω (cid:48) F , for example, ˆ X ( P ) (cid:0) ( αh + (1 − α ) h ) + (cid:1) ≤ α ˆ X ( P ) (cid:0) ( h ) + (cid:1) + (1 − α ) ˆ X ( P ) (cid:0) ( h ) + (cid:1) , where the inequality relies on the nonnegativity of the innermost term and convexity of x (cid:55)→ x for x ≥ . Then Theorem 3.3 of Fang and Santos (2019) applies. The second part ofthe theorem is a special case of the ﬁrst, when the part of the relationship that leads tonondegenerate behavior is not empty. Proof of Lemma A.4.

First, let s n = t − n and deﬁne the ﬁnite diﬀerences ∆ n = sup X (cid:18) sup U ( s n f + h )( u, x ) + sup U ( s n g + k )( u, x ) (cid:19) − s n θ ( f, g ) (63)so that for any s n (cid:37) ∞ , we need to show that ∆ n → θ (cid:48) f,g ( h, k ) deﬁned in the statement ofthe theorem.Fix an (cid:15) > . Then for any x / ∈ M θ ( (cid:15) ) , note that sup U ( s n f + h )( u, x ) + sup U ( s n g + k )( u, x ) − s n θ ( f, g ) ≤ sup h + sup k − s n (cid:15). (64)Similarly, if u / ∈ M f ( x, (cid:15) ) for any x (the case for u that do not nearly-optimize g ( · , x ) issymmetric), then also ( s n f + h )( u, x ) + sup U ( s n g + k )( u, x ) − s n θ ( f, g ) ≤ sup h + sup k − s n (cid:15) (65)49or that x . Therefore for any (cid:15) > , lim sup n ∆ n = lim sup n (cid:32) sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) ( s n f + h )( u, x ) + sup M g ( x,(cid:15) ) ( s n g + k )( u, x ) (cid:33) − s n θ ( f, g ) (cid:33) ≤ lim sup n (cid:32) s n sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) f ( u, x ) + sup M g ( x,(cid:15) ) g ( u, x ) (cid:33) − s n θ ( f, g )+ sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) (cid:33) = sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) , (66)so that this inequality holds as (cid:15) (cid:38) .Next, for any (cid:15) > deﬁne ¯ t ( (cid:15) ) = sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) . (67)Because this function is nondecreasing in (cid:15) , it has a limit as (cid:15) (cid:38) , so that for any m ∈ N there exists an x m ∈ M θ (1 /m ) and ( u fm , u gm ) satisfying the inequality h ( u fm , x m ) + k ( u gm , x m ) ≥ ¯ t (1 /m ) − /m. Therefore ¯ t (1 /m ) ≤ h ( u fm , x m ) + k ( u gm , x m ) + 1 /m = s n f ( u fm , x m ) + h ( u fm , x m ) + s n g ( u gm , x m ) + k ( u gm , x m )+ 1 /m − s n ( f ( u fm , x m ) + g ( u gm , x m )) ≤ sup X (cid:18) sup U ( s n f + h )( u, x ) + sup U ( s n g + k )( u, x ) (cid:19) − s n θ ( f, g ) + ( s n + 1) /m, (68)which implies that lim (cid:15) (cid:38) sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) = lim m →∞ ¯ t (1 /m ) ≤ ∆ n . (69)50 roof of Theorem A.6. Start by considering V . As in the proof of Theorem 4.2, we simplifythe analysis by writing this statistic as a composition of maps that act on just two functionalarguments, ( φ (cid:48) θ ( f,g ) ◦ θ (cid:48) f,g )( h, k ) , where the positive-part map φ (cid:48) x is deﬁned in (46) and θ (cid:48) f,g is,for any h, k ∈ (cid:96) ∞ ( U × X )) , θ (cid:48) f,g ( h, k ) = lim (cid:15) (cid:38) sup x ∈M θ ( (cid:15) ) (cid:32) lim (cid:15) (cid:38) sup u ∈M f ( x,(cid:15) ) h ( u, x ) + lim (cid:15) (cid:38) sup u ∈M g ( x,(cid:15) ) k ( u, x ) (cid:33) , where M f ( x, (cid:15) ) and M θ ( (cid:15) ) are deﬁned in (55) and (56).It can be veriﬁed that for a ﬁxed value of θ ( f, g ) , ˆ φ (cid:48) θ ( f,g ) ( x ) is convex and nondecreasing.Next consider θ (cid:48) f,g . For any (cid:15) > , consider the map applied to the convex combination ofvector-valued functions α ( h , k ) + (1 − α )( h , k ) : sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) ( αh ( u, x ) + (1 − α ) k ( u, x )) + sup M g ( x,(cid:15) ) ( αh ( u, x ) + (1 − α ) k ( u, x )) (cid:33) ≤ sup M θ ( (cid:15) ) (cid:32) α (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) + (1 − α ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33)(cid:33) ≤ α sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) +(1 − α ) sup M θ ( (cid:15) ) (cid:32) sup M f ( x,(cid:15) ) h ( u, x ) + sup M g ( x,(cid:15) ) k ( u, x ) (cid:33) . Therefore, letting (cid:15) (cid:38) , it can be seen that θ (cid:48) f,g is convex. Because V is the composition ofa non-decreasing, convex function with a convex function, V is also a convex map of ( h, k ) to R (Boyd and Vandenberghe, 2004, eq. 3.11). As mentioned in the text, P ⊆ P nec . ThereforeCorollary 3.2 of Fang and Santos (2019) implies lim sup n →∞ P n (cid:8) V n > q V ∗ (1 − α ) (cid:9) ≤ α. Turn next to W . Similarly, write this statistic as a map of pairs of bounded functions tothe real line as W n = ( λ (cid:48) µ ( f,g ) ◦ µ (cid:48) f,g )( h, k ) , where for each x ∈ X , µ ( f, g )( x ) = sup U f ( u, x ) + sup U g ( u, x ) and µ (cid:48) f,g ( h, k )( x ) = lim (cid:15) (cid:38) max u ∈M f ( x,(cid:15) ) h ( u, x ) + lim (cid:15) (cid:38) max u ∈M g ( x,(cid:15) ) k ( u, x ) , and for any functions f, h ∈ (cid:96) ∞ ( X ) , λ (cid:48) f ( h ) is deﬁned in (49). We show the convexity of this51omposition directly. Paraphrase µ ( x ) = µ ( f, g )( x ) , and for ﬁxed (cid:15) > , µ (cid:48) ( x ) = sup u ∈M f ( x,(cid:15) ) h ( u, x ) + sup u ∈M g ( x,(cid:15) ) k ( u, x ) µ (cid:48) ( x ) = sup u ∈M f ( x,(cid:15) ) h ( u, x ) + sup u ∈M g ( x,(cid:15) ) k ( u, x )¯ µ (cid:48) ( x ) = sup u ∈M f ( x,(cid:15) ) ( αh + (1 − α ) k )( u, x ) + sup u ∈M g ( x,(cid:15) ) ( αh + (1 − α ) k )( u, x ) . Finally, let X denote the region where µ ( x ) = 0 . Then Lemma A.2 shows that λ (cid:48) µ (¯ µ (cid:48) ) = λ (¯ µ (cid:48) | X ) , where ¯ µ (cid:48) | X denotes the restriction of the function ¯ µ (cid:48) to the set X . Consider theﬁrst term on the right hand side. Inside the integral, it can be seen that ≤ (cid:0) ¯ µ (cid:48) ( x ) (cid:1) + = (cid:32) sup u ∈M f ( x,(cid:15) ) ( αh + (1 − α ) h )( u, x ) + sup u ∈M g ( x,(cid:15) ) ( αk + (1 − α ) k )( u, x ) (cid:33) + ≤ (cid:32) α (cid:32) sup u ∈M f ( x,(cid:15) ) h ( u, x ) + sup u ∈M g ( x,(cid:15) ) k ( u, x ) (cid:33) + (1 − α ) (cid:32) sup u ∈M f ( x,(cid:15) ) h ( u, x ) + sup u ∈M g ( x,(cid:15) ) k ( u, x ) (cid:33) (cid:33) + = (cid:0) αµ (cid:48) ( x ) + (1 − α ) µ (cid:48) ( x ) (cid:1) + ≤ α (cid:0) µ (cid:48) ( x ) (cid:1) + + (1 − α ) (cid:0) µ (cid:48) ( x ) (cid:1) + . Because the integrand is nonnegative, subadditivity of the L norm implies λ (¯ µ (cid:48) | X ) ≤ αλ ( µ (cid:48) | X ) + (1 − α ) λ ( µ (cid:48) | X ) . This inequality holds as (cid:15) (cid:38) by the assumed square-integrability of the arguments. ThereforeCorollary 3.2 of Fang and Santos (2019) implies lim sup n →∞ P n (cid:8) W n > q W ∗ (1 − α ) (cid:9) ≤ α. eferences Aaberge, R., T. Havnes, and

M. Mogstad (2018): “Ranking Intersecting DistributionFunctions,”

Journal of Applied Econometrics , forthcoming.

Alesina, A., and

F. Passarelli (2019): “Loss Aversion, Politics and Redistribution,”

Amer-ican Journal of Political Science , 63, 936–947.

Andrews, D. W., and

X. Shi (2013): “Inference Based on Conditional Moment Inequalities,”

Econometrica , 81, 609–666.

Atkinson, A. B. (1970): “On the Measurement of Inequality,”

Journal of Economic Theory ,2, 244–263.

Barrett, G. F., and

S. G. Donald (2003): “Consistent Tests for Stochastic Dominance,”

Econometrica , 71, 71–104.

Bhattacharya, D., and

P. Dupas (2012): “Inferring Welfare Maximizing Treatment As-signment under Budget Constraints,”

Journal of Econometrics , 167, 168–196.

Bitler, M. P., J. B. Gelbach, and

H. W. Hoynes (2006): “What Mean Impacts Miss:Distributional Eﬀects of Welfare Reform Experiments,”

American Economic Review , 96,988–1012.

Boyd, S., and

L. Vandenberghe (2004):

Convex Optimization . Cambridge UniversityPress, Cambridge.

Cárcamo, J., A. Cuevas, and

L.-A. Rodríguez (2019): “Directional Diﬀerentiability forSupremum-Type Functionals: Statistical Applications,” arXiv e-prints , p. arXiv:1902.01136.

Carneiro, P., K. T. Hansen, and

J. J. Heckman (2001): “Removing the Veil of Ignorancein Assessing the Distributional Impacts of Social Policies,”

Swedish Economic Policy Review ,8, 273–301.

Cattaneo, M. D., M. Jansson, and

K. Nagasawa (2017): “Bootstrap-Based Inferencefor Cube Root Consistent Estimators,” Working paper.

Chetverikov, D., A. Santos, and

A. M. Shaikh (2018): “The Econometrics of ShapeRestrictions,”

Annual Review of Economics , 10, 31–63.

Chew, S. H. (1983): “A Generalization of the Quasilinear Mean with Applications to theMeasurement of Income Inequality and Decision Theory Resolving the Allais Paradox,”

Econometrica , 51, 1065–1092.

Cho, J. S., and

H. White (2018): “Directionally Diﬀerentiable Econometric Models,”

Econo-metric Theory , 34, 1101–1131.

Christensen, T., and

B. Connault (2019): “Counterfactual Sensitivity and Robustness,”Working paper.

Dehejia, R. (2005): “Program Evaluation as a Decision Problem,”

Journal of Econometrics ,125, 141–173. 53 ümbgen, L. (1993): “On Nondiﬀerentiable Functions and the Bootstrap,”

Probability Theoryand Related Fields , 95, 125–140.

Eeckhoudt, L., and

H. Schlesinger (2006): “Putting Risk in Its Proper Place,”

AmericanEconomic Review , 96, 280–289.

Fang, Z., and

A. Santos (2019): “Inference on Directionally Diﬀerentiable Functions,”

Review of Economic Studies , 86, 377–412.

Fishburn, P. C. (1980): “Continua of Stochastic Dominance Relations for Unbounded Prob-ability Distributions,”

Journal of Mathematical Economics , 7, 271–285.

Frank, M. J., R. B. Nelsen, and

B. Schweizer (1987): “Best-Possible Bounds for theDistribution of a Sum — A Problem of Kolmogorov,”

Probability Theory and Related Fields ,74, 199–211.

Freund, C., and c. Özden (2008): “Trade Policy and Loss Aversion,”

American EconomicReview , 98, 1675–1691.

Gajdos, T., and

J. A. Weymark (2012): “Introduction to Inequality and Risk,”

Journalof Economic Theory , 147, 1313–1330.

Heckman, J. J., J. Smith, and

N. Clements (1997): “Making the Most Out of ProgrammeEvaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts,”

Review of Economic Studies , 64, 487–535.

Heckman, J. J., and

J. A. Smith (1998): “Evaluating the Welfare State,” in

Econometricsand Economic Theory in the Twentieth Century: The Ragnar Frisch Centennial Symposium ,ed. by S. Strom. Cambridge University Press, New York.

Hirano, K., and

J. Porter (2009): “Asymptotics for Statistical Treatment Rules,”

Econo-metrica , 77, 1683–1701.

Hong, H., and

J. Li (2018): “The Numerical Delta Method,”

Journal of Econometrics , 206,379–394.

Kahneman, D., and

A. Tversky (1979): “Prospect Theory: An Analysis of Decision UnderRisk,”

Econometrica , 47, 263–292.

Kasy, M. (2016): “Partial Identiﬁcation, Distributional Preferences, and the Welfare Rankingof Policies,”

Review of Economics and Statistics , 98, 111–131.

Kőszegi, B., and

M. Rabin (2006): “A Model of Reference-Dependent Preferences,”

Quar-terly Journal of Economics , CXXI, 1133–1165.

Kitagawa, T., and

A. Tetenov (2018): “Who Should Be Treated? Empirical WelfareMaximization Methods for Treatment Choice,”

Econometrica , 86, 591–616.(2019): “Equality-Minded Treatment Choice,”

Journal of Business and EconomicStatistics , forthcoming. 54 evy, H. (2016):

Stochastic Dominance: Investment Decision Making Under Uncertainty,3rd edition . Springer International Publishing, Switzerland.

Linton, O., E. Maasoumi, and

Y.-J. Whang (2005): “Consistent Testing for StochasticDominance Under General Sampling Schemes,”

Review of Economic Studies , 72, 735–765.

Linton, O., K. Song, and

Y.-J. Whang (2010): “An Improved Bootstrap Test of StochasticDominance,”

Journal of Econometrics , 154, 186–202.

Makarov, G. (1982): “Estimates for the Distribution Function of a Sum of Two RandomVariables when the Marginal Distributions are Fixed,”

Theory of Probability and its Appli-cations , 26(4), 803–806.

Manski, C. F. (2004): “Statistical Treatment Rules for Heterogeneous Populations,”

Econo-metrica , 72, 1221–1246.

Masten, M. A., and

A. Poirier (2020): “Inference on Breakdown Frontiers,”

QuantitativeEconomics , 11, 41–111.

Rabin, M., and

R. H. Thaler (2001): “Anomalies: Risk Aversion,”

Journal of EconomicPerspectives , 15, 219–232.

Rick, S. (2011): “Losses, Gains, and Brains: Neuroeconomics Can Help to Answer OpenQuestions about Loss Aversion,”

Journal of Consumer Psychology , 21, 453–463.

Roemer, J. E. (1998):

Theories of Distributive Justice . Harvard University Press, Cambridge.

Rüschendorf, L. (1982): “Random Variables with Maximum Sums,”

Advances in AppliedProbability , 14, 623–632.

Samuelson, W., and

R. Zeckhauser (1988): “Status Quo Bias in Decision Making,”

Jour-nal of Risk and Uncertainty , 1, 7–59.

Sen, A. K. (2000):

Freedom, Rationality and Social Choice: The Arrow Lectures and OtherEssays . Oxford University Press, Oxford.

Shaked, M., and

G. J. Shanthikumar (1994):

Stochastic Orders and Their Applications .Academic Press, San Diego, CA.

Shapiro, A. (1990): “On Concepts of Directional Diﬀerentiability,”

Journal of OptimizationTheory and Applications , 66, 477–487.

Stoye, J. (2009): “Minimax Regret Treatment Choice With Finite Samples,”

Journal ofEconometrics , 151, 70–81.

Tetenov, A. (2012): “Statistical Treatment Choice Based on Asymmetric Minimax RegretCriteria,”

Journal of Econometrics , 166, 157–165.

Tversky, A., and

D. Kahneman (1991): “Loss Aversion in Riskless Choice: A Reference-Dependent Model,”

Quarterly Journal of Economics , 106, 1039–1061.551992): “Advances in Prospect Theory: Cumulative Representation of Uncertainty,”

Journal of Risk and Uncertainty , 5, 297–323. van der Vaart, A. W. (1998):

Asymptotic Statistics . Cambridge University Press, Cam-bridge. van der Vaart, A. W., and

J. A. Wellner (1996):

Weak Convergence and EmpiricalProcesses . Springer, New York.

Wellner, J. A. (1992): “Empirical Processes in Action: A Review,”

International StatisticalReview , 60, 247–269.

Weymark, J. A. (1981): “Generalized Gini Inequality Indices,”

Mathematical Social Sciences ,1, 409–430.

Williamson, R. C., and

T. Downs (1990): “Probabilistic Arithmetic I. Numerical Methodsfor Calculating Convolutions and Dependency Bounds,”

International Journal of Approxi-mate Reasoning , 4, 89–158.

Yaari, M. E. (1987): “The Dual Theory of Choice Under Risk,”

Econometrica , 55, 95–115.(1988): “A Controversial Proposal Concerning Inequality Measurement,”

Journal ofEconomic Theory , 44, 381–397. 56 upplemental appendix to “Loss aversion and the welfare rankingof policy interventions”

This supplement appendix contains numerical Monte Carlo simulations studying the empiricalsize and power of the statistical methods proposed in the main text and additional results forthe empirical application in Section 5 of the main text.

C Monte Carlo simulations

In this section, we compare the ﬁnite sample performances tests proposed in the text fortesting the LASD null hypothesis. We describe the results of simulation experiments usedto investigate the size and power properties of the tests described in the main text. Thereare three simulation settings: a normal location model and a triangular model under pointidentiﬁcation, and a normal location model under partial identiﬁcation.

C.1 Normal model, identiﬁed case

In this experiment there are two independent, Gaussian random variables that represent point-identiﬁed outcomes. The scale of both distributions is set to unity, the location of distribution A is set to zero and the location of distribution B is allowed to vary. Letting µ B denote thelocation of distribution B , tests should not reject the null H : F A (cid:23) LASD F B when µ B ≤ and should reject the null when µ B > . This is a case where P is a singleton, which is when µ B = 0 .We select constant sequences in the following way. Let n = n A + n B . The estimatedcontact sets ˆ X k = { x ∈ X : | ˆ m kn ( x ) | ≤ a n } worked well using a n = 4 log(log( n )) / √ n . Forestimated (cid:15) -maximizer sets ˆ M k = { x ∈ X : ˆ m kn ( x ) > sup ˆ m kn ( x ) − b n } we used b n = (cid:112) log(log( n )) /n . For deciding on which coordinate appeared signiﬁcantly larger than theother, or whether both coordinates reached approximately the same supremum, that is, whenestimating | max ˆ m n ( x ) − max ˆ m n ( x ) | ≤ c n , we used the same constant sequence as b n , thatis, c n = (cid:112) log(log( n )) /n . These sequences were used after preliminary simulations with thenormal model, and were used in the other two simulations as well (with n = n + n A + n B inthe partially-identiﬁed setting).The size and power of the tests is good in this example, as can be seen in Figure 3. Themean of distribution B ran from − / √ n to / √ n so the alternatives are local to the boundaryof the null region. Sample sizes were identical for both samples and set equal to 100, 500or 1,000. When resampling, the number of bootstrap repetitions was set equal to 499 (forsamples of size 100), 999 (for samples of size 500) or 1,999 (for samples of size 1,000). Figure 3plots empirical rejection probabilities from 1,000 simulation runs.From Figure 3 it can be seen that the empirical rejection probabilities are relatively close57 . . . . . Normal location experiment m B n E m p i r i c a l r e j e c t i on p r obab ili t y n = 100n = 500n = 1000V V W W Figure 3: Empirical rejection probabilities of the LASD tests in the point identiﬁed normallocation model experiment. The tests are of nominal 5% size, should have exactly 5% rejectionprobability when µ B = 0 and should reject when µ B > . V n and V n tests have identicalbehavior so only V n results are shown. Samples of sizes 100, 500 and 1000 correspond respec-tively to 499, 999 and 1999 bootstrap repetitions. Distributions are local to the boundary ofthe null region, which is where µ B = 0 . 1000 simulation repetitions.to the nominal 5% rejection probability at the boundary of the null region when µ B = 0 . Thebehavior of supremum norm tests was identical so only V n test results are shown. The W n and W n results are close and the diﬀerences are due to numerical integration that occurs overone or two dimensions depending on the statistic. C.2 Triangular model, identiﬁed case

In this experiment we use two independent triangular random variables, where we let θ =( α, β, γ ) denote the lower endpoint of the support, the mode of the distribution and the upperendpoint of the support. Distribution A uses θ A = ( − , , , while the shape of distribution B is allowed to vary. For a parameter (cid:15) ∈ [ − / , / we let θ B = ( − − (cid:15)/ √ n, − (cid:15)/ √ n, (cid:15)/ √ n ) ,so that all the distributions are local to the boundary of the null region represented by (cid:15) = 0 .Two distributions are depicted in Figure 4, in which (cid:15) = 1 / . This implies that F A (cid:23) LASD F B .From the right panel of the plot it can be seen that these distributions satisfy an LASDordering, but they would not be ordered by FOSD.Figure 5 shows the empirical rejection results from the triangular model experiment. We58 . . . . . . Densities x f ( x ) f A f B −2 −1 0 1 2 . . . . . . CDFs x F ( x ) F A F B Figure 4: Triangular model densities and distribution functions. In this example distribution F A (cid:23) LASD F B (in terms of the description in the text, (cid:15) = 1 / for distribution B ). Heuris-tically, the higher gains under policy B are outweighed by the probability of larger losses sothat distribution A dominates distribution B in the LASD sense, but F A (cid:54)(cid:23) F OSD F B .allow (cid:15) , which controls the shape of distribution B , to vary between − / and / . The testsin this experiment should reject the null when (cid:15) < , should equal the nominal size at (cid:15) = 0 and should not reject when (cid:15) > . Because of the restricted supports of the distributions andthe relatively small region for (cid:15) , the horizontal axis for the power curves shown in Figure 5 isthe value of the alternative parameters in absolute scale and not local alternatives. Thereforethe power curves show a noticeable change over diﬀerent values of the sample sizes used. C.3 Normal model, partially identiﬁed case

In this experiment we use three independent normal random variables ( Z , Z A , Z B ) with scalesset to unity and location parameters µ = (0 , , µ B ) , where µ B is allowed to vary. We denotethis triple of marginal normal CDFs by G ( µ B ) . Rounding to one decimal place, the null H : F A (cid:23) LASD F B should be rejected when µ B > . . We let µ B vary locally around thisapproximate boundary point. Figure 6 depicts the T ( G ( µ B )) function for µ B = 2 . , . or . . Tests are designed to detect the positive deviation in the right-most panel of the ﬁgure,when T ( G )( x ) > for some x ≥ .Figure 7 shows empirical rejection probabilities for tests with three independent normaldistributions. The tests are not conducted under any assumptions about the independenceof the samples. The rejection probabilities are diﬀerent than those in the point-identiﬁedexperiments — more evidence is needed to detect deviations from the null region than in theidentiﬁed case, because the bound U B combines observations from the control and sample B .Although more information is necessary, it is important to note that these alternatives (likein the other experiments) are local to the boundary of the P nec set.59 . . . . . . Triangular distribution experiment e E m p i r i c a l r e j e c t i on p r obab ili t y n = 100n = 500n = 1000V V W W Figure 5: Empirical rejection probabilities of the LASD tests in the point identiﬁed triangu-lar model experiment. The tests are of nominal 5% size, should have exactly 5% rejectionprobability when (cid:15) = 0 and should reject when (cid:15) < . Samples of sizes 100, 500 and 1000correspond respectively to 499, 999 and 1999 bootstrap repetitions. Distributions are aroundthe boundary of the null region, which is where (cid:15) = 0 , but plotted on an absolute, not local,scale. 1000 simulation repetitions.As can be seen in Figure 7, the tests in the partially identiﬁed case do not reject the nullwith as high a probability as in the point identiﬁed case, which is a direct result of the lackof knowledge about inter-sample correlations that dictates the form of the T function deﬁnedin the main text. Also, it appears as though these deviations from the null are not very welldetected by the Cramér-von Mises tests in relation to the Kolmogorov-Smirnov tests. However,it is important to note that in this example, alternatives are local alternatives, and representsmaller and smaller deviations from the null region as sample sizes increase. D Application

In this section we present the additional test results for the empirical application discussedin Section 5 of the main paper. Table 2 includes results of V n statistics, the second Table 3contains V n statistics, Table 4 the third contains W n statistics and ﬁnally Table 5 reproducesthe table of W n results used in the main text. The tables reveal that all the tests have verysimilar qualitative conclusions. Some of the entries are exactly the same across tables andare indeed repetitions of the same tests, but the tables are shown in entirety to facilitate60 − . − . − . − . − . − . − . . m B = x T ( G )( x ) − . − . − . − . − . − . . m B = x T ( G )( x ) − . − . − . − . − . − . . m B = x T ( G )( x ) Figure 6: The T ( G ( µ B )) function for diﬀerent values of the location of the marginal distri-bution function G B . Tests should reject the null hypothesis when T ( G )( x ) > for some x asin the right panel. LASD in changes FOSD in levels F AF DC (cid:23) F JF F JF (cid:23) F AF DC equality G AF DC (cid:23) G JF G JF (cid:23) G AF DC equalityavg-RA . . . . p-value . . . . lastQ-RA . . . p-value . . . avg-TL . . . . . p-value . . . . . lastQ-TL . . p-value . . Table 2: Table of sup-norm tests. LASD tests use the T1 process.

ASD in changes FOSD in levels F AF DC (cid:23) F JF F JF (cid:23) F AF DC equality G AF DC (cid:23) G JF G JF (cid:23) G AF DC equalityavg-RA . . . . p-value . . . . lastQ-RA . . . p-value . . . avg-TL . . . . . p-value . . . . . lastQ-TL . . p-value . . Table 4: Table of L2-norm tests. LASD tests use the T1 process.

LASD in changes FOSD in levels F AF DC (cid:23) F JF F JF (cid:23) F AF DC equality G AF DC (cid:23) G JF G JF (cid:23) G AF DC equalityavg-RA . . . . p-value . . . . lastQ-RA . . . p-value . . . avg-TL . . . . . p-value . . . . . lastQ-TL . . p-value . . Table 5: Table of L2-norm tests. LASD tests use the T2 process.62 . . . . . . Normal location, partially identified ( m B - ) n E m p i r i c a l r e j e c t i on p r obab ili t y n = 100n = 500n = 1000V W Figure 7: Empirical rejection probabilities of the LASD tests in the partially identiﬁed normallocation model experiment. The control and policy A distributions have means set to zero,while the location of policy B is allowed to vary. The tests are of nominal 5% size, should haveexactly 5% rejection probability when ( µ B − . √ n = 0 and should reject when ( µ B − . √ n > (alternatives are local to the boundary of the set P nec described in the text). Samples ofsizes 100, 500 and 1000 correspond respectively to 499, 999 and 1999 bootstrap repetitions.1000 simulation repetitions.Finally, we note that the example could be used to conduct tests under partial identiﬁca-tion, as if we had no knowledge of the longitudinal structure of the data. However, tests using V n or W nn