[PDF] Generalized Social Marginal Welfare Weights Imply Inconsistent Comparisons of Tax Policies

Abstract

This paper concerns Saez and Stantcheva's (2016) generalized social marginal welfare weights (GSMWW), which aggregate losses and gains due to tax policies, while incorporating non-utilitarian ethical considerations. The approach evaluates local tax changes without a global social objective. I argue that local tax policy comparisons implicitly entail global comparisons. Moreover, whenever welfare weights do not have a utilitarian structure, these implied global comparisons are inconsistent. One motivation for GSMWW is that it preserves the Pareto principle. I argue that the approach's problems should spark a reconsideration of Pareto if one wants to represent broader values in formal policy analysis.

Full PDF

aa r X i v : . [ ec on . T H ] F e b Generalized Social Marginal Welfare WeightsImply Inconsistent Comparisons of Tax Policies

Itai Sher ∗ University of Massachusetts AmherstFebruary 16, 2021

Abstract

This paper concerns Saez and Stantcheva’s (2016) generalized social marginal welfare weights (GSMWW), which are used to aggregate losses and gains due to the tax system, while incor-porating non-utilitarian ethical considerations. That approach evaluates local changes in taxpolicy without appealing to a global social objective. However, I argue that local comparisonsbetween diﬀerent tax systems implicitly entail global comparisons. Moreover, whenever welfareweights are not of a utilitarian kind, these implied global comparisons are inconsistent. Partof the motivation for the GSMWW approach is that it provides a way to incorporate broaderethical judgements into the evaluation of the tax system while preserving the Pareto principle.I suggest that the problems with the approach ought to spark a reconsideration of Pareto if onewants to represent broader values in formal policy analysis.

This paper explores an approach to optimal tax proposed by Saez and Stantcheva (2016), the generalized social marginal welfare weights (GSMWW) approach. The purpose of the approach isto incorporate broader values into evaluation of optimal income tax. The approach evaluates taxpolicies locally by assigning welfare weights to individuals on the basis of various morally relevantfactors.It is instructive to contrast the GSMWW approach with the the standard approach whichinvolves maximization of a utilitarian objective R u i d i subject to a revenue requirement. Small taxreforms ∆ T –in other words, small local changes in the tax policy–can be evaluated by weighingthe eﬀects of those changes on diﬀerent people ∆ T i by their marginal utility u ′ i ; bearing in mindthat the ∆ T i ’s are additional payments, a tax reform is desirable if R u ′ i ∆ T i d i <

0. A necessarycondition for a tax policy to be optimal optimal is that no locally revenue neutral tax reform isdesirable; that is, for all locally revenue neutral ∆ T , R u ′ i ∆ T i d i = 0. ∗ email: [email protected]. I am grateful for helpful comments and discussions with Maya Eden, Louis Perrault,Paolo Piacquadio, Peter Sher, and Matt Weinzierl and to seminar audiences at UC Riverside and at the WelfareEconomics and Economic Policy virtual seminar. g i without an objective. These weights need not depend only on marginalutility; they can depend on other moral considerations, such as fairness, libertarian values, equalityof opportunity, poverty alleviation. Parallelling the utilitarian formulas, a tax reform is desirableif R g i ∆ T i d i <

0, and and a tax policy T is optimal if, at that policy, for all locally revenue neutral∆ T , R g i ∆ T i d i = 0; that is, at an optimal policy, there is no small revenue neutral desirable reform.As the local optimality condition under the GMSWW approach has a similar structure to the localoptimality condition under the standard utilitarian approach, it can be used to derive a familiarformula T ′ ( z ) = 1 − ¯ G ( z )1 − ¯ G ( z ) + α ( z ) · e ( z ) (1)for the optimal marginal tax rates, where the term ¯ G ( z ) represents a normative judgement em-bodied in generalized welfare weights, and α ( z ) and e ( z ) are empirical terms. This generalizesa standard optima tax formula (Saez 2001) in which standard welfare weights play the role ofthe generalized weights above. This gives the impression that there is a neat separation betweenbetween the positive and normative ingredients that go into the determination of optimal policy,and that one can essentially use tools and results of standard optimal tax to accommodate broaderethical values. I shall argue that this is not correct: The decomposition of positive and normativeingredients does not take the form (1) and to incorporate broader values, we must depart moreradically from the standard optimal tax framework.One attractive feature of the GSMWW approach is that it preserves the Pareto eﬃciency ofthe standard utilitarian approach. In particular, Saez and Stantcheva (2016) establish that any taxpolicy that is locally optimal according to their approach is also locally Pareto optimal.In this paper, I criticize the GSMWW approach. I want to emphasize that I greatly admirethe approach. Saez and Stantcheva (2016) is an inspiring attempts to incorporate broader valuesinto welfare economics. However, the approach ultimately runs into problems, and the problems itfaces are instructive.In contemplating the apparent compatibility, within the GSMWW approach, between thePareto principle and broader values, one might be initially puzzled in light of results from Sen(1970, 1979) and Kaplow and Shavell (2001) to the eﬀect that there is a conﬂict between socialevaluations that incorporate broader non-welfarist values and the Pareto principle. How are Saezand Stantcheva (2016) able to incorporate these broader values without running afoul of Pareto?Saez and Stantcheva (2016) write “In our approach ... there is no social welfare objectiveprimitive that the government maximizes.” (p. 24) The current paper contends that it is the lackof such an objective that allows for eﬃciency to co-exist with broader values. It is important tothink about what it means that the GSMWW approach does not correspond to maximizing anyobjective. The GSMWW approach provides information about whether locally one tax policy isbetter than another. Suppose that there is no global objective that is consistent with all of those2ocal comparisons. Then I would argue that one should conclude that the comparisons implied byGSMWW are not coherent. One way of seeing this is that it would imply that anyone who didhave a coherent global ranking of tax policies would disagree with the judgements of GSMWW forsome comparisons.Saez and Stantcheva (2016) do not formalize the observation that GSMWW is not consistentwith a global objective; in this paper, I do. I show that for some speciﬁcations of generalized socialwelfare weights, any binary relation that attempts to rationalize them will contain a cycle of theform: T is better than T , which is better than T , which in turn is better than T (where the T j ’sare tax policies). Indeed, my main result–Theorem 1–shows that such cycles occur precisely whenGSMWW are not of a utilitarian kind–that is, the GSMWW approach is inconsistent precisely inthose cases when the GSMWW approach is supposed to go beyond the standard approach.While the GSMWW approach is intended to only make local comparisons, it is not surprisingthese local rankings imply global comparisons; it is analogous to the fact that we can recover thefunction f : R n → R up to a constant from its gradient ∇ f : R n → R n , which informs us about itslocal behavior.The problem with the GSMWW approach is that, rather than working from foundations, itattempts to reverse engineer a solution that resembles the traditional approach. This would beanalogous to directly modifying the optimality conditions derived from an optimization problemrather than modifying the original underlying problem. It is the optimization problem that givessigniﬁcance to the optimality conditions, not the other way around. Modifying the optimalityconditions directly might not lead to a coherent solution to any problem; in fact that is what Iclaim happens in this case.If we really want to incorporate broader values into optimal tax policy, then it is unlikely thatthe solution will take the form (1) analogous to the traditional approach. We should expect anapproach that incorporates broader values to depart more radically from the traditional approach.In particular, it may be that we are willing to trade oﬀ welfare against other moral considerations,meaning, for example, that everyone may be worse oﬀ, and only slightly worse oﬀ on average, while,say, fairness is much better satisﬁed, and we would deem that an improvement. In other words,taking diverse moral principles seriously may lead to a conﬂict with Pareto eﬃciency. This pointis discussed further in Sections 3 and 8.I should note that the argument of this paper does not preclude moral values that cannot berepresented by some social objective. It may be for example that certain systems of obligationsand permissions cannot be so represented. It may be that the right –what one ought to do–is reallyseparate from the good –what is better or worse–in such a way that moral behavior is not a matterof maximizing some objective. What the argument of this paper precludes is a system that putsthe good before the right, directing one to act so as to maximize a local objective that cannot becoherently extended to a global objective.The outline of this paper is as follows. Section 2 presents the GSMWW framework. Section 3explains informally why we might expect the GSMWW approach to run into problems, elaborating3n some of the points that I have made in this introduction. Section 4 deﬁnes what it means for aglobal relation to rationalize social marginal welfare weights. Section 5 presents a simple example inwhich welfare weights are rationalizable, and Section 6 presents a simple example in which welfareweights are not rationalizable. Section 7 presents my main result, which shows that generalizedsocial welfare weights are rationalizable only when they take an essentially utilitarian form, so thatthey are not rationalizable precisely in those cases when the theory of GSMWW purports to gobeyond the standard theory. Section 8 concludes with a discussion of the signiﬁcance of the results. There is a continuum of agents in the interval I = [0 ,

1] distributed with Lebesgue measure λ .Each agent i ∈ I has observable characteristics x i ∈ X and unobservable characteristics y i ∈ Y , where X and Y are compact subsets of a Euclidean space. I assume that i x i and i y i have at most ﬁnitely many discontinuities. If it is impermissible to condition taxes on a certaincharacteristic, then we treat that characteristic as unobservable.Let c i be agent i ’s consumption and z i be agent i ’s income. I assume that consumption cantake values in R and income can take values in the set Z = [0 , ¯ z ] for some ¯ z > i ∈ [0 ,

1] has a utility function u i ( c i , z i ) = u ( c i − v i ( z i )) , where v i ( z i ) = v ( z i ; x i , y i )and u : R → R is a strictly increasing twice continuously diﬀerentiable function that is commonacross agents, and v : Z × X × Y → is continuous, and ∂∂z v ( z ; x, y ) and ∂ ∂z v ( z ; x, y ) exist and arecontinuous at all ( z ; x, y ) ∈ Z × X × Y . v ( z i ; x i , y i ) is interpreted as the cost in terms of consumptionof earning income z i given characteristics ( x i , y i ), and u transforms the utility representation c i − v i ( z i ) into a representation that is adequate for making utilitarian interpersonal comparisons. Iassume that ∀ i, v ′ i ( z i ) >

1, which implies that the maximum “possible” income ¯ z in Z is selectedso that it is so large that no agent would actually choose it in the absence of taxes.A tax policy is a function T : Z × X → R , where T ( z ; x ) is the tax paid by citizens withincome z and x is the agent’s observable characteristics. T is the set of all tax policies. Formally, T is the set of all continuous functions T on Z × X such that ∂∂z T ( z ; x ) and ∂ ∂z T ( z ; x ) exist andare continuous at all ( z ; x ) ∈ Z × X .Deﬁne the norm ρ ( T ) = max n | T ( z ; x ) | + (cid:12)(cid:12) ∂∂z T ( z ; x ) (cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂z T ( z ; x ) (cid:12)(cid:12)(cid:12) : ( z ; x ) ∈ Z × X o , andlet T be endowed with the metric topology induced by ρ . Thus the notion of closeness for taxpolicies depends on the ﬁrst two derivatives of T and not just its values.Let T ◦ = n T ∈ T : ∂ ∂z T ( z ; x ) > , ∀ ( z ; x ) ∈ Z × X o . Thus T ◦ is the set of functions with4 positive second z derivative everywhere. It is easy to see that T ◦ is an open set in the metrictopology induced by ρ .We write T i ( z i ) = T ( z i ; x i ) . Given a tax policy T , we have c i = z i − T i ( z i ). Deﬁne z i ( T ) to be i ’s optimal income whenfacing tax system T ; that is, z i ( T ) ∈ arg max z i z i − T i ( z i ) − v i ( z i ) . Observe that since u is strictly monotone, z i ( T ) ∈ arg max z i u ( z i − T i ( z i ) − v i ( z i )). Now deﬁne U i ( T ) = u ( z i ( T ) − T i ( z i ( T )) − v ( z i ( T ))) , (2)˜ U i ( T ) = z i ( T ) − T i ( z i ( T )) − v ( z i ( T )) . (3)Thus U i ( T ) is the utility induced by tax system T expressed in terms adequate for utilitarianinterpersonal comparisons; and ˜ U i ( T ) is an alternative utility representation in dollar terms givingthe consumption such that i would be indiﬀerent between optimizing against tax system T andhaving consumption ˜ U i ( T ) without incurring the costs of earning income.For any tax system T , deﬁne R : T → R by R ( T ) = Z T i ( z i ( T )) d i. The novelty in the GSMWW approach is the way that tax systems are evaluated. Let g ( c i , z i ; x i , y i )be the generalized social welfare weight . Thus, we assign a certain weight to each agent de-pending on their consumption c i , their income z i , and their characteristics x i , y i . Formally, a system of generalized social welfare weights is an integrable function g : R × Z × X × Y → R such that g ( c i , z i ; x i , y i ) > , ∀ c i , z i , x i , y i . (4)Let G be the set of all systems of generalized social welfare weights. Deﬁne g i ( c i , z i ) = g ( c i , z i ; x i , y i ) . The intuitive interpretation of generalized social marginal welfare weights is that they measure themarginal value of giving a dollar to each person i .Under the utilitarian approach, the goal of a tax system T –or any other policy for that matter–is For R ( T ) to be uniquely deﬁned, we must assume that the optimal income z i ( T ) is unique for almost all i .

5o maximize the sum of agent utilities Z U i ( T ) d i. (5)The generalized social welfare weight approach attempts to bring into play more general normativeconsiderations. Rather than having a global objective like (5), the social welfare weights approachis local: it looks at a tax policy and in considering small changes to the tax policy, it weighsthe incremental dollars to diﬀerent individuals according not just to their marginal utility, butaccording to other considerations (possibly including marginal utility for consumption), such asthose involving fairness.A key aspect of the approach is that there is no global objective . Given the tax system T to beevaluated, the local marginal welfare weight g i ( T ) = g i ( z i ( T ) − T i ( z i ( T )) , z i ( T ))(local at T ) is endogenously determined.Some examples from Saez and Stantcheva (2016) to illustrate generalized social welfare weightsare as follows: • Utilitarian weights : g i ( c i , z i ) = u ′ ( c i − v i ( z i )). These are the weights that arise out of thestandard utilitarian framework. That is, the priority put on giving a dollar to any individualis proportional to its marginal utility. • Libertarian weights : g i ( c i , z i ) = ˜ g ( z i − c i ) = ˜ g ( t i ) where ˜ g ′ ( t i ) >

0, where t i = z i − c i is thetax paid. That is, the more tax a person has already paid, the greater the weight placed onthat person. • Libertarian-utilitarian mix : g i ( c i , z i ) = ˜ g ( c i − v i ( z i ) , z i − c i ) = ˜ g (˜ u i , t i ) where ˜ u i = c i − v i ( z i ) with ∂ ˜ g∂ ˜ u i < ∂ ˜ g∂t i >

0; the ﬁrst inequality can be interpreted as saying thatweights are increasing in marginal utility for consumption (since u ′ i ( c i − v i ( z i )) is decreasingin c i − v i ( z i )) and the second says that they are also increasing in taxes paid. • Poverty elimination : g ( c i , z i ) = 1 if c i < ¯ c where ¯ c is the poverty threshold and g ( c i , z i ) = 0otherwise; that is, we put positive and equal weight on those beneath the poverty line, andno weight on those below the poverty line. • Counterfactuals : Welfare weights can be made to depend on how much someone would haveworked in the absence of taxes (which depends on their type) in comparison to how muchthey work in the presence of taxes. • Equality of opportunity:

Weights can be made to depend on one’s rank in the income distri-bution conditional on one’s background conditions (but such weights go beyond the formal Such weights violate the positivity condition (4). c i , z i , x i ,and y i .)A tax reform is a function ∆ T ∈ T whose interpretation is that it represents some changeto the status quo tax policy. Deﬁne ∆ T i ( z i ) = ∆ T ( z i , x i ). Let ˆ T be on open subset of T thatcontains T ◦ as a subset of T (in the metric topology induced by ρ ) and such that for all T ∈ ˆ T , and all ∆ T ∈ T and almost all i ∈ [0 , dd ε ˜ U i ( T + ε ∆ T ) exists at ε = 0 and is continuous at ε = 0. By the envelope theorem, when dd ε ˜ U i ( T + ε ∆ T ) exists at ε = 0 and is continuous at ε = 0, dd ε (cid:12)(cid:12) ε =0 ˜ U i ( T + ε ∆ T ) = − ∆ T i ( z ( T )).Say tax reform ∆ T is locally budget neutral at T if dd ε (cid:12)(cid:12) ε =0 R ( T + ε ∆ T ) = 0. For any taxpolicy T ∈ ˆ T , tax reform ∆ T ∈ T , and system of social welfare weights g ∈ G , say that a locallybudget neutral tax reform ∆ T is locally desirable if Z g i ( T ) ∆ T i ( z i ( T )) d i < . In other words, ∆ T is desirable if the cost of the tax change to diﬀerent individuals, weighted bythe local welfare weights, is negative. Say that tax system T ∈ ˆ T satisﬁes the local optimal taxcriterion if ∀ ∆ T ∈ T , (cid:20) dd ε (cid:12)(cid:12)(cid:12)(cid:12) ε =0 R ( T + ε ∆ T ) = 0 ⇒ Z g i ( T ) ∆ T i ( z i ( T )) d i = 0 (cid:21) . (6)Saez and Stantcheva (2016) say that (6) gives a necessary condition for local optimality of a taxsystem T : Any locally budget neutral tax reform has no local aggregate eﬀect on welfare whenchanges in tax liability are weighted by generalized social welfare weights evaluated at T .There is however a question of why the local optimal tax criterion is the right thing to look at.Why should we think that it is good if a tax system satisﬁes (6)? What normative assumptionsjustify the criterion (6)?In a more traditional framework, the above questions are answered on the basis of the propertiesof a global optimization problem from which a condition such as (6) is derived . To see this, considerthe case of utilitarianism: Suppose the goal is to maximize the utilitarian objective R U i ( T ) d i subject to a revenue constraint R ( T ) ≥ E , where E represents required government expenditures.Then, using the envelope theorem, a necessary condition for T to be an optimum is: ∀ ∆ T ∈ T , (cid:20) dd ε (cid:12)(cid:12)(cid:12)(cid:12) ε =0 R ( T + ε ∆ T ) = 0 ⇒ Z u ′ ( z i ( T ) − T i ( z i ( T )) − v i ( z i ( T ))) ∆ T i ( z i ( T )) d i = 0 (cid:21) . (7) Note that if ˆ T = T ◦ , then these requirements are satisﬁed. Note that because u is continuously diﬀerentiable, for any T, ∆ T ∈ T , dd ε ˜ U i ( T + ε ∆ T ) exists at ε = 0 and iscontinuous at ε = 0 if and only if dd ε U i ( T + ε ∆ T ) exists at ε = 0 and is continuous at ε = 0. This requires that for almost all i , and all ∆ T ∈ T , h i ( ε ) = U i ( T + ε ∆ T ) is continuously diﬀerentiable in aneighborhood around ε = 0.

7f the weights are utilitarian (that is, g i ( c i , z i ) = u ′ ( c i − v i ( z i ))), then (7) coincides with (6). Butin the utilitarian case, the justiﬁcation for condition (7) is that it is a necessary condition for anoptimum if one’s aim is to maximize the objective R U i ( T ) d i subject to a revenue requirement,and one has independent reasons for thinking that R U i ( T ) d i is a reasonable objective.However a justiﬁcation analogous to (7) is not available to Saez and Stantcheva (2016) because,under their approach, the local optimality conditions are not derived from a global optimizationproblem; it is a purely local optimality condition. The question is then why such a local optimalitycondition is justiﬁed. Usually such conditions are justiﬁed by a broader global optimization problemfrom which they are derived.In their appendix, Saez and Stantcheva (2016) do however describe a foundation for theirgeneralized social welfare weights. In fact, their results depend on this foundation; in particularthe proof that (6) is indeed a necessary condition for a local optimum–their Proposition 1–dependson the deﬁnitions and arguments presented in the appendix. Proposition 1, in turn, serves as afoundation for other results, such as the characterization of optimal marginal tax rates (Proposition2). Let us then explore this foundation.Any system of social welfare weights g and tax policy ˜ T together deﬁne a social welfare function W g ˜ T by which tax policies T in general can be evaluated. In particular, this social welfare functiontakes the form: W g ˜ T ( T ) = Z g i (cid:16) ˜ T (cid:17) ˜ U i ( T ) d i. (8) W g ˜ T is the social welfare function that evaluates all tax policies T using welfare weights deﬁnedlocally by the tax policy ˜ T . Recall that ˜ U i ( T ) is the version of agent’s indirect utility function indollar terms (3). Proposition 1

For all systems of social welfare weights g , tax policies T ∈ ˆ T and tax reforms ∆ T , dd ε (cid:12)(cid:12)(cid:12)(cid:12) ε =0 W gT ( T + ε ∆ T ) = − Z g i ( T ) ∆ T i ( z i ( T )) d i. This result follows immediately from the envelope theorem.Saez and Stantcheva (2016) provide the following deﬁnition (in their appendix):

Deﬁnition 1

A tax system T ∈ T as locally optimal if there exists a neighborhood N of T suchthat for all T ′ ∈ N, R ( T ) = R ( T ′ ) ⇒ W gT ( T ) ≥ W gT ( T ′ ) . Saez and Stantcheva (2016) establish the following result (Proposition 1 in Saez and Stantcheva(2016)).

Proposition 2 If T ∈ ˆ T is locally optimal, then T satisﬁes the local optimal tax criterion (6). This proposition also requires the assumption that for almost all i , and all ∆ T ∈ T , h i ( ε ) = U i ( T + ε ∆ T ) iscontinuously diﬀerentiable in a neighborhood of ε = 0. W gT ( T ) ≥ W gT (cid:16) ˜ T (cid:17) is a good criterion for making local comparisonsbetween T and ˜ T .One seeming virtue of the GSMWW approach is that it leads to tax policies that are locallyPareto optimal. Formally, say that a tax system T ∈ T is locally Pareto optimal if there existsa neighborhood N such that for all ˜ T ∈ N , R ( T ) = R (cid:16) ˜ T (cid:17) ⇒ h λ (cid:16) i : U i (cid:16) ˜ T (cid:17) > U i ( T ) (cid:17) > ⇒ λ (cid:16) i : U i ( T ) > U i (cid:16) ˜ T (cid:17)(cid:17) > i , where recall λ is Lebesgue measure. In other words, if T is locally Pareto optimal, then withinsome neighborhood N of T , any revenue-equivalent tax system ˜ T that is better for some positivemeasure set of agents and raises the same revenue as T is worse for some other positive measureset of agents; there is no tax policy ˜ T in N that is superior for a positive measure set while beingworse only for a zero measure set. Proposition 3 If T ∈ T is locally optimal, then T is locally Pareto optimal. Proposition 3 is signiﬁcant, but it does not by itself provide a justiﬁcation for the optimalitycriterion in Deﬁnition 1. Moreover, as I discuss in Section 3, there is a tension between the Paretoprinciple and the representation of broad ethical values.

In this section I explore a prima facie reason to expect that the GSMWW approach may encounterproblems.Several authors, including Sen (1970, 1979) and Kaplow and Shavell (2001), have argued thatincorporating broader moral considerations into economic evaluation is inconsistent with the Paretoprinciple. Diﬀerent authors have interpreted this conﬂict in diﬀerent ways. Sen interprets this as anargument against insisting on the Pareto principle, whereas Kaplow and Shavell (2001) interpret itas an argument against including non-welfarist considerations in normative economic evaluation. As they say, one philosopher’s modus ponens is another philosopher’s modus tollens.The question is then how the GSMWW approach avoids the problem. The GSMWW approachincorporates broader social values into normative evaluations and at the same time it appears to This is closely related to Proposition 4 of Saez and Stantcheva (2016). See also Weymark (2017). any global ranking over tax systems.The basic idea behind the Sen (1970, 1979) and Kaplow and Shavell (2001) is as follows. Con-sider two states s and s and a set of agents, such that each agent i has a utility u i ( s j ) for eachof the two states j = 1 ,

2. Suppose that u i ( s ) = u i ( s ) for all i , so that all agents are indiﬀerentbetween the two states. Suppose there is another moral diﬀerence between s and s , having to dowith fairness or freedom or desert or rights or any other moral consideration that is not capturedin the agents’ utility functions. Pareto leaves essentially no room for such considerations to enter.In particular, let W be a social welfare function on the states that satisﬁes a Pareto indiﬀerencecondition: [ u i ( s ) = u i ( s ) , ∀ i ] ⇒ W ( s ) = W ( s ) . Pareto indiﬀerence implies that we cannot view any moral criterion other than preference satisfac-tion as being independently good ; that is we cannot say that there is a trade-oﬀ between how faira situation is and how well preferences are satisﬁed: If preference satisfaction is held ﬁxed (i.e., u i ( s ) = u i ( s ) , ∀ i ), then nothing else matters to social evaluation (i.e., W ( s ) = W ( s )). Somaking a situation more fair for example or reducing rights violations will only matter if it has aneﬀect on preferences, but will never matter in itself. Similar considerations apply if we considerother versions of the Pareto principle, such as weak or strong Pareto.Fleurbaey, Tungodden and Chang (2003) criticize Kaplow and Shavell (2001) for claiming thattheir results show that Pareto implies welfarism ; this criticism is based on the technical deﬁni-tion of welfarism. Fleurbaey and Maniquet (2011) put forward a theory of social welfare thatincorporates fairness considerations while respecting Pareto. Fleurbaey and Maniquet (2018) re-view the literature incorporating fairness into optimal tax. Fleurbaey and Maniquet (2018) write,“One of our main points ... is that the classical social welfare function framework is more ﬂexiblethan commonly thought, and can accommodate a very large set of non-utilitarian values.” HoweverFleurbaey and Maniquet (2018) do not dispute the above reasoning: namely, that Pareto indiﬀer-ence restricts the incorporation of broader values in precisely the way described in the precedingparagraph.When we consider Proposition 3 and the claim that welfare weights incorporate broader ethicalvalues we may wonder how this is consistent with the the considerations raised by Sen (1970, 1979)and Kaplow and Shavell (2001), and whether somehow the considerations raised by Fleurbaey and See Theorem 2 of Weymark (2016). In addition to a Pareto principle, an additional independence condition isrequired for welfarism. See also d’Aspremont and Gevers (1977), Sen (1977, 1986), Hammond (1979), Roberts (1980),Bossert and Weymark (2004).

The GSMWW approach makes local comparisons among tax policies. The question that I addresshere is whether these local comparisons are consistent . But how do we assess the consistency oflocal welfare weights? We do so by asking whether there is a global ranking of tax policies thatagrees locally with the GSMWW approach. If there is no such global ranking, then we say that theweights are inconsistent. Why is this a reasonable way of assessing consistency? The reason is thatif it is not possible to have a global ranking of tax policies whose judgements agree with those oflocal welfare weights, then anyone who arrives at a global ranking will disagree with the judgementsof the welfare weights on some local comparisons . That is, to follow the GSMWW approach requiresthat you cannot have a global ranking of tax policies. Another way of expressing the idea is to saythat if you pooled all the local judgements in order to create a global ranking, that global rankingwould be intransitive. That is the sense in which welfare weights deliver inconsistent judgementsif they cannot be extended to a global ranking.The basic idea is analogous to the following. Imagine that I know the gradient of a function ∇ f : R n → R . This allows me to make local comparisons: For any direction d , if ∇ f ( x ) · d > x , f is increasing in direction d . And if I know the gradient of the function everywhere, thenfor any pair of points, x and y , I can recover the diﬀerence f ( y ) − f ( x ) = R ∇ f ( ρ ( θ )) · ρ ′ ( θ ) d θ ,where ρ is a diﬀerentiable path from x to y . So local comparisons via ∇ f imply global comparisonsof the form f ( y ) > f ( x ) inherent in f . In an analogous way, we can derive global comparisonsfrom the local comparisons of the GSMWW approach.Deﬁne a path to be a continuous function ρ : [ a, b ] → ˆ T ; θ T ρ,θ , where a, b ∈ R , a < , < b .Now for paths ρ and all i ∈ [0 , T ρi : R × [ a, b ] → R by T ρi ( z i , θ ) := T ρ,θi ( z i ).For any path ρ and θ ∈ [ a, b ], deﬁne z ρi ( θ ) = z i (cid:0) T ρ,θ (cid:1) .Let P be the set of paths ρ such that(i) for all i , ( z i , θ ) T ρi ( z i , θ ) is continuously diﬀerentiable,(ii) for all θ ∈ [ a, b ], there are at most ﬁnitely many i ∈ [0 ,

1] such that for some z ∈ Z , i ∂∂θ T ρi ( z, θ ) is discontinuous at i = i , and12iii) ∀ θ ∈ [ a, b ] , R (cid:0) T ρ,θ (cid:1) = R (cid:0) T ρ, (cid:1) .In particular observe that by (iii), revenue is constant on any path ρ in P .Consider a binary relation - on ˆ T . Say that - is a preorder if - is transitive and reﬂexive. Forany binary relation - on ˆ T , let ∼ and ≺ be the symmetric and asymmetric parts of - , respectively. Deﬁnition 2

Let g be a system of welfare weights. Say that a binary relation - rationalizes g iffor all ρ ∈ P , local improvement principle: Z g i (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) d i < ⇒ h ∃ ¯ θ ∈ (0 , , ∀ θ ∈ (cid:0) , ¯ θ (cid:1) , T ρ, ≺ T ρ,θ i , Z g i (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) d i > ⇒ h ∃ ¯ θ ∈ (0 , , ∀ θ ∈ (cid:0) , ¯ θ (cid:1) , T ρ, ≻ T ρ,θ i , (9) indiﬀerence principle: and (cid:18) ∀ θ ′ ∈ [0 , , Z g i (cid:16) T ρ,θ ′ (cid:17) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ T ρi (cid:0) z ρi (cid:0) θ ′ (cid:1) , θ (cid:1) d i = 0 (cid:19) ⇒ T ρ, ∼ T ρ, . (10) Say that the system of welfare weights g is rationalizable if there exists a preorder that rationalizes g . Remark 1

Note that we deﬁne the what it means for - to rationalize g for an arbitrary binaryrelation - , not just a preorder. But we say that g is rationalizable only if it is rationalized by apreorder. The GSMWW approach implies a series of local comparisons. The relation - attempts to simul-taneously capture all of the local comparisons implied by the social welfare weights. In this sense,it is a global relation.I will discuss the two conditions (9) and (10). Consider ﬁrst the local improvement principle (9). Consider a path ρ such that for some T ∈ ˆ T and ∆ T ∈ T , T ρ,θ = T + θ ∆ T. (11)In this case, the condition Z g i (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) d i < , (12)from the antecedent of the ﬁrst part of (9) reduces to: Z g i ( T ) ∆ T i ( z i ( T )) d i < . (13) We may assume that for all θ ∈ [ a, b ] , T + θ ∆ T ∈ ˆ T ; observe that since T ∈ ˆ T , for all ∆ T ∈ T , there exists ξ > T + θξ ∆ T ∈ ˆ T for all θ ∈ [ a, b ]. So we could consider ξ ∆ T rather than ∆ T to begin with. T is a locally budget neutral tax reform so thatdd θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 R ( T + θ ∆ T ) = 0 , (14)then (13) just amounts to Saez and Stantcheva’s deﬁnition of a locally desirable tax reform. So, inthis case, the ﬁrst line in the local improvement principle (9) reduces to the claim that if ∆ T is alocally budget neutral and locally desirable tax reform (locally desirable in the sense of (13)), thena suﬃciently small version of the reform T + θ ∆ T –that is, for suﬃciently small θ > T according to the global relation ≺ ; that is T ≺ T + θ ∆ T .Now if we are to be strict with our deﬁnitions, we will notice that the deﬁnition of P is suchthat it requires that for all ρ ∈ P , ∀ θ ∈ [ a, b ] , R (cid:16) T ρ,θ (cid:17) = R (cid:0) T ρ, (cid:1) , (15)but if ρ is chosen so that (11) holds, this constant revenue condition may not hold for ρ . However,suppose that instead we choose ρ so that T ρ,θ = T + θ ∆ T + ( R ( T ) − R ( T + θ ∆ T )) , (16)where the last term ( R ( T ) − R ( T + θ ∆ T )) is a lumpsum transfer, independent of income andobservable characteristics. Then (15) does hold, and if we continue to assume (14), then (12) stillreduces to (13). Above we have examined a special case of the local improvement principle which basically saysthat a locally budget neutral and locally desirable tax reform in Saez and Stantcheva’s sense isdeemed to be good according to the global relation - . It seems that Saez and Stancheva ought tobe committed to the local improvement principle in the special case. However it is diﬃcult to seewhy the local improvement should be compelling in this special case, but not more generally when T ρ,θ does not necessarily take the form given in the preceding paragraph. The local improvementcondition maintains the same spirit in the more general case: It says that if we have a parameterizedfamily of revenue equivalent tax policies (cid:0) T ρ,θ (cid:1) θ ∈ [ a,b ] and start at θ = 0 and an increase in θ is locallydesirable, then for a suﬃciently small θ > T ρ,θ should be socially preferred to T ρ, . This seemslike a weak and reasonable way of inferring properties of global social preference from local socialpreference.Note that the second line of the local improvement is analogous to the ﬁrst, but it applies tothe case where the local change is undesirable rather than desirable.Whereas the local improvement principle allows us to draw inferences about global social strictpreference , the indiﬀerence principle allows us to draw inferences about global social indiﬀerence .The indiﬀerence principle says that if there is a smooth path ρ of revenue-equivalent tax policessuch that at every point θ ′ on the path, welfare weights cannot detect either an improvement or In particular, observe that if (16) and (14) hold, then ∂∂θ (cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) = ∆ T i ( z i ( T )). θ ′ : Z g i (cid:16) T ρ,θ ′ (cid:17) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ T ρi (cid:0) z ρi (cid:0) θ ′ (cid:1) , θ (cid:1) d i = 0 , then the endpoints of the path are indiﬀerent according to the global relation. It is useful to makean analogy. Suppose that ρ : [0 , → R n is a smooth path in R n and f : R n → R is a smoothfunction. Deﬁne f ρ : R → R by f ρ ( θ ) = f ( ρ ( θ )). Then dd θ f ρ ( θ ) = ∇ f ( ρ ( θ )) · ρ ′ ( θ ), where ρ ′ ( θ ) = ( ρ ′ ( θ ) , · · · , ρ ′ n ( θ )) and ρ ′ i ( θ ) is the derivative of the i -component ρ i ( θ ) of ρ ( θ ). Thenthe indiﬀerence condition is analogous to the condition that if dd θ f ρ ( θ ) = 0 for all θ ∈ [0 , f ρ (0) = f ρ (1), which in this case is a consequence of the fundamental theorem of calculus.For the utilitarian case we have the following proposition. Proposition 4

Consider the utilitarain social welfare function W : ˆ T → R deﬁned by W ( T ) = Z U i ( T ) d i, and deﬁne the relation - on ˆ T by ( T - T ⇔ [ W ( T ) ≤ W ( T ) and R ( T ) ≤ R ( T )]) , ∀ T , T ∈ ˆ T . (17) That is T is ranked T weakly higher by - if only if T is weakly better in terms of both utilitarianwelfare and revenue raised. Let welfare weights be utilitarian: g i ( c i , z i ) = u ′ ( c i − v i ( z i )) . Then - rationalizes g . The proof is instructive. First, I establish the local improvement principle in the utilitariancase. Choose a path ρ ∈ P and suppose that (12) holds.Observe thatdd θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 W (cid:16) T ρ,θ (cid:17) = − Z u ′ ( z ρi (0) − T i ( z ρi (0)) − v i ( z ρi (0))) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) d i = − Z g i (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρi ( z ρi (0) , θ ) d i, where the ﬁrst line follows from the envelope theorem and the second follows from the fact thatwelfare weights are utilitarian. It now follows from (12) that dd θ (cid:12)(cid:12) θ =0 W (cid:0) T ρ, (cid:1) >

0. So there exists¯ θ > θ ∈ (cid:0) , ¯ θ (cid:1) , W (cid:0) T ρ, (cid:1) < W (cid:0) T ρ,θ (cid:1) . It now follows from (17) and the fact thatrevenue is constant along any path in P that T ρ, ≺ T ρ,θ . This, along with a similar argument for Observe that the theorem would remain true if (17) were replaced by either( T - T ⇔ W ( T ) ≤ W ( T )) , ∀ T , T ∈ ˆ T or ( T - T ⇔ [ W ( T ) ≤ W ( T ) and R ( T ) = R ( T )]) , ∀ T , T ∈ ˆ T . (18) ρ ∈ P , ∀ θ ′ ∈ [0 , , Z g i (cid:16) T ρ,θ ′ (cid:17) ∂∂θ T ρi (cid:0) z ρi (cid:0) θ ′ (cid:1) ; θ ′ (cid:1) d i = 0 . (19)We have W (cid:0) T ρ, (cid:1) − W (cid:0) T ρ, (cid:1) = Z U i (cid:0) T ρ, (cid:1) d i − Z U i (cid:0) T ρ, (cid:1) d i = Z (cid:20) dd θ Z U i (cid:16) T θ (cid:17) d i (cid:21) d θ = Z Z dd θ U i (cid:16) T ρ,θ (cid:17) d i d θ = Z Z u ′ (cid:16) z ρi ( θ ) − T ρ,θi (cid:16) z ρi (cid:16) T ρ,θ (cid:17)(cid:17) − v i (cid:16) z ρi (cid:16) T ρ,θ (cid:17)(cid:17)(cid:17) ∂∂θ T i (cid:16) z i (cid:16) T θ (cid:17) , θ (cid:17) d i d θ = Z Z g i (cid:16) T ρ,θ (cid:17) ∂∂θ T ρi ( z ρi ( θ ) , θ ) d i d θ = Z θ = 0 , (20)where the second equality follows from the fundamental theorem of calculus, the fourth from theenvelope theorem, the ﬁfth from the assumption that welfare weights are utilitarian, and the sixthfrom (19). It follows that W (cid:0) T ρ, (cid:1) = W (cid:0) T ρ, (cid:1) and by the deﬁnition of P , R (cid:0) T ρ, (cid:1) = R (cid:0) T ρ, (cid:1) .So by (17), T ρ, ∼ T ρ, . This establishes the indiﬀerence principle in the utilitarian case, andcompletes the proof of the proposition. (cid:3) Proposition 4 provides support for the notion of rationalization in Deﬁnition 2. If welfareweights are utilitarian and the global ranking of revenue equivalent tax policies is determined bya utilitarian social welfare function, then the local improvement and indiﬀerence principles are theorems . This provides further plausibility to these conditions.In this paper, I do not assume that welfare weights are utilitarian and, in Deﬁnition 2, the localimprovement and indiﬀerence principles are treated as axioms that characterize rationalization. Myclaim is that these principles are reasonable axioms for capturing the implicit local comparisonsmade by social welfare weights and integrating them into an overall ranking. Note that for mypurposes it is not so important that the rationalizing relation captures all such comparisons, justthat the comparisons it does capture are genuinely implied.Before concluding this section, I would like to include a couple of deﬁnitions of rankings - ontax policies, as well as an alternative deﬁnition of rationalization. Recall that λ is the Lebesguemeasure on the interval [0 , - on tax policiessatisﬁes Pareto indiﬀerence if: ∀ T , T ∈ ˆ T , R ( T ) = R ( T ) ⇒ h λ (cid:16) i : ˜ U i ( T ) = ˜ U i ( T ) (cid:17) = 1 ⇒ T ∼ T i . (21)That is - satisﬁes Pareto indiﬀerence if, whenever almost all agents are indiﬀerent between two16ax policies, T and T , then the ranking is indiﬀerent among these policies as well. Deﬁnition 3

Let g be a system of welfare weights. Say that a binary relation - rationalizes g with Pareto indiﬀerence if for all ρ ∈ P , g and - jointly satisfy the local improvement principle(9) and Pareto indiﬀerence (21). Say that g is rationalizable with Pareto indiﬀerence if thereexists a preorder that rationalizes g with Pareto indiﬀerence. Deﬁnition 3 provides an alternative deﬁnition of rationalization that substitutes the Pareto in-diﬀerence condition for the indiﬀerence principle. It will help to establish the robustness of thephenomenon studied here because it will help to show that generalized social welfare weights failto be rationalizable under alternative deﬁnitions of rationalization: See Section 6.One might think that Saez and Stantcheva (2016) need not be committed to the Pareto in-diﬀerence condition for a rationalizing relation: After all, this the Pareto indiﬀerence conditionrestricts social evaluation to be measurable with respect to welfare and the point of GSMWW isto bring broader normative values into the analysis. However notice how GSMWW brings broadervalues into the analysis: by placing weights on the resources going to each agent. If all agents areindiﬀerent among two tax policies, then it might seem that if does not matter how those weightsare assigned, the tax policies should be regarded as equally good. An alternative framework thatdid not represent the fairness in terms of the weights attached to diﬀerent individuals might moreclearly articulate why indiﬀerence among all agents to those tax policies might not imply that theyshould be regarded as socially indiﬀerent to each other when one is in some sense more fair thanthe other.

In the previous section we saw a special case in which welfare weights are rationalizable: namelywhen the welfare weights are utilitarian (Proposition 4). In this section, we present another, perhapsless obvious case in which welfare weights can be rationalized: libertarian welfare weights with ﬁxedincomes; as we shall see below, the ﬁxed incomes are crucial. This special case of libertarian weightswith ﬁxed incomes is discussed in Section II.A of Saez and Stantcheva (2016).Recall that libertarian welfare weights are weights of the form g i ( c i , z i ) = ˜ g i ( z i − c i ) = ˜ g i ( t i )where t i = z i − c i is the tax paid and ˜ g i ( t i ) is increasing in t i . Assume moreover that for all i there17xists income z ∗ i such that v i ( z i ) =  , if z i ≤ z ∗ i ∞ , if z i > z ∗ i (22)Strictly speaking this cost function does not ﬁt into the general assumptions laid out in Section 2because the cost function in (22) is discontinuous at z ∗ i and hence not diﬀerentiable.In this section I restrict attention to diﬀerentiable tax policies T i is diﬀerentiable and T ′ i ( z ) < z . Let ˆ T ′ = n T ∈ ˆ T : T ′ ( z ) < , ∀ z ∈ Z o . In this section we say that a binary relation - rationalizes g on T ′ if the the conditions of Deﬁnition 9 are satisﬁed when we restrict attentionto tax policies in T ′ . For all such tax policies T ∈ T ′ , z i ( T ) = z ∗ i . So the assumptions in thissection essentially amount to the assumption that incomes are ﬁxed and there are no behavioralresponses to taxes.Now deﬁne the function H i ( t i ) = − Z t i g i (cid:0) t ′ i (cid:1) dt ′ i . (23)Note that H ′ i ( t i ) = − g i ( t i ) . (24)Now consider the social welfare function W ( T ) = Z H i ( T ( z ∗ i )) d i (25)Deﬁne - by ( T - T ⇔ [ W ( T ) ≤ W ( T ) and R ( T ) ≤ R ( T )]) , ∀ T , T ∈ ˆ T ′ . (26) Proposition 5 - rationalizes g on ˆ T ′ . Proof. Choose ρ ∈ P . Observe that for any θ ′ ∈ [ a, b ],dd θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ W (cid:16) T ρ,θ (cid:17) = Z H ′ i (cid:16) T ρ,θ ′ ( z ∗ i ) (cid:17) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ T ρ,θ ( z ∗ i ) d i = − Z g i (cid:16) T ρ,θ ′ (cid:17) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ T ρ,θ (cid:0) z ρi (cid:0) θ ′ (cid:1)(cid:1) d i, (27)where the second equality follows from (24) and the fact that z ρi ( θ ′ ) = z ∗ i . First consider thelocal improvement principle, and assume that (12) holds. It follows from (12) and (27) that It is not necessary assume that the cost of earning income is literally inﬁnite above z ∗ i ; it suﬃcient to assumethat this cost is very large. Recall that in this section we assume that for all ρ ∈ P and θ ∈ [ a, b ], T ρ,θ ∈ ˆ T ′ . d θ (cid:12)(cid:12) θ =0 W (cid:0) T ρ, (cid:1) >

0. It follows that there exists ¯ θ ∈ (0 ,

1] such that for all θ ∈ (cid:0) , ¯ θ (cid:1) W (cid:0) T ρ, (cid:1)

1. Under the above assumptions, there does not exist a preorder - that ratio-nalizes g . Any binary relation that rationalizes g contains a cycle of the form T ≺ T ∼ T ≺ T ∼ T . (29)

2. Moreover, there does not exist a preorder - that rationalizes g with Pareto indiﬀerence. Anybinary relation that rationalizes g with Pareto indiﬀerence creates a cycle of the form (29). So under these assumptions it is impossible to rationalize the welfare weights and the impossibilityis robust to the precise deﬁnition of rationalization–rationalization in either the sense of Deﬁnition2 or Deﬁnition 3 works in the result. Observe that in this example, we can condition taxes onobservable characteristics and not just income. Allowing taxes to be conditioned on observablecharacteristics makes it easier to generate an example. In the next section, I will present a generalresult that shows that it is generally possible to construct such examples when taxes depend onlyon income. The general result will assume that agents are heterogenous in the senses that (i)they may face diﬀerent costs of earning income and (ii) their welfare weight may depend on theircharacteristics as well as their consumption and income.Let us now see why the result is true. Let us consider linear taxes of the form T ( z ) = α + βz . Ifan agent with cost function v ( z ) = z faces such a tax schedule, the agent will solve the followingproblem max z (cid:20) z − ( α + βz ) − z (cid:21) The ﬁrst order condition is (1 − β ) − z = 0 . So if we deﬁne z ( β ) as the optimal solution to the agent’s problem when the agent faces a marginaltax rate of β , we have: z ( β ) = 1 − β. For any marginal tax rate β and utility level u , we can set select a transfer α ( β, u ) so that theagent’s utility when facing taxes α ( β, u ) + βz is u . Formally, α ( β, u ) solves z ( β ) − ( α ( β, u ) + βz ( β )) −

12 [ z ( β )] = u.

21n particular α ( β, u ) = 12 (1 − β ) − u. (30)Let T uβ ( z ) = α ( β, u ) + βz. That is T uβ is the unique tax policy that gives the agent a utility level of u at marginal tax rate β , given that the agent responds optimally to the policy. Note that for any pair of marginal taxrates, β and β , agents are indiﬀerent between facing the tax policies T uβ and T uβ because bothgenerate utility u . However, these diﬀerent tax policies do not generate the same revenue. Whilewe can adjust the lumpsum tax α ( β, u ), so that he agent’s utility is held ﬁxed as we raise β andcorrespondingly lower α , the total taxes raised from the agent fall. This is because when marginaltax rates are lower, the agent works more and must be compensated for their eﬀorts to keep theirutility constant. For example, to achieve a utility level of zero, it is possible to (i) set the marginaltax rate equal to 1 and collect no transfer, raising no taxes (tax policy T ), or (ii) set the marginaltax rate to 0 and collect a lump-sum tax of (tax policy T ). While agents are indiﬀerent between T and T , there is a sense in which T is clearly better because it raises more revenue, which canbe used for beneﬁcial purposes (which are not modeled formally).The proof of Proposition 6 does require that we hold revenue ﬁxed because the deﬁnition ofrationalization concerns comparisons among tax policies raising equal revenue. Now consider thefollowing family of tax policies ˜ T β , where β ∈ [0 , T β ( z i ; x i ) =  α ( β,

0) + βz, if x i = East Coast α (cid:16)p − β , (cid:17) + (cid:16)p − β (cid:17) z i , if x i = Interior , if x i = West CoastThis family of tax policies is as follows. Agents on the West Coast pay a tax of independently oftheir income. Agents on the East Coast pay a marginal tax rate β and those in the interior pay amarginal tax rate of p − β and in both cases the lumpsum transfer is adjusted to keep agents’utility at 0. So all agents are indiﬀerent among all tax policies of the form ˜ T β , as β varies between To see that (30) holds, observe that if we assume (30), we get z ( β ) − ( α ( β, u ) + βz ( β )) −

12 [ z ( β )] = (1 − β ) − (cid:18)(cid:20)

12 (1 − β ) − u (cid:21) + β (1 − β ) (cid:19) −

12 (1 − β ) = (cid:20) −

12 (1 − β ) + β −

12 (1 − β ) (cid:21) (1 − β ) + u = u.

22 and 1. Observe that R (cid:16) ˜ T β (cid:17) = 13 (cid:20)

12 (1 − β ) + β (1 − β ) (cid:21) + 13 (cid:20) (cid:16) − p − β (cid:17) + p − β (cid:16) − p − β (cid:17)(cid:21) + 13 14= 13 (cid:20)

12 (1 + β ) (1 − β ) (cid:21) + 13 (cid:20) (cid:16) p − β (cid:17) (cid:16) − p − β (cid:17)(cid:21) + 13 14= 212 (cid:2)(cid:0) − β (cid:1) + (cid:0) − (cid:0) − β (cid:1)(cid:1)(cid:3) + 112 = 312= 14 . Observe that the revenue of ˜ T β is $ independently of the value of β . This is because as we raise β , the revenue from taxpayers on the east coast falls, but the revenue in the interior rises exactlyso as to oﬀset it.Let - be a binary relation that represents g . Consider the parameterized collection of taxpolicies (cid:16) ˜ T β (cid:17) β ∈ [0 , . It follows from the construction of taxes–the fact that for all agents utility isunchanged as β varies–and the envelope theorem that for all β ∈ [0 , dd β (cid:12)(cid:12)(cid:12) β = β ˜ U i (cid:16) ˜ T β (cid:17) = − dd β (cid:12)(cid:12)(cid:12) β = β ˜ T β ( z ( β )). This can also be conﬁrmed by direct calculation. It follows that for all β ∈ [0 , Z g i (cid:16) ˜ T β (cid:17) dd β (cid:12)(cid:12)(cid:12)(cid:12) β = β ˜ T β ( z ( β )) d i = 0 . It follows from the indiﬀerence principle (10) that˜ T ∼ ˜ T . (31)Alternatively we can derive (31) from Pareto indiﬀerence if we assume instead that - rationalizes g with Pareto indiﬀerence.Now deﬁne the tax reform ∆ T ( z ) , =  i ∈ East Coast0 , if i ∈ Interior − , if i ∈ West CoastThis tax reform amounts to a transfer of $1 from the East Coast to the West Coast. Then for any ε >

0, the tax policy ˜ T β + ε ∆ T is the tax policy that results from ˜ T β by transferring $ ε lumpsumfrom agents in the East Coast to agents in the West Coast. It is easy to see that for ﬁxed ε , allagents are indiﬀerent among all tax policies of the form ˜ T β + ε ∆ T as β varies; the reasoning isessentially the same as for ˜ T β . Moreover since ∆ T is just a lumpsum transfer that does not aﬀect23ehavior, we have R (cid:16) ˜ T β + ε ∆ T (cid:17) = R (cid:16) ˜ T β (cid:17) = 14 , ∀ β ∈ [0 , , ∀ ε ∈ R . It then follows again from the indiﬀerence principle or, alternatively, from Pareto indiﬀerence, that˜ T + ε ∆ T ∼ ˜ T + ε ∆ T. (32)Observe that an agent i with x i = East Coast pays a tax of $ under ˜ T and a tax of $0 under ˜ T .The respective calculations are:˜ T (cid:16) z i (cid:16) ˜ T (cid:17) ; E (cid:17) = α (0 ,

0) + 0 × z (0) = 12 (1 − = 12 , ˜ T (cid:16) z i (cid:16) ˜ T (cid:17) ; E (cid:17) = α (1 ,

0) + 1 × z (1) = 12 (1 − + 1 × (1 −

1) = 0 . Observe that the tax paid by i with x i = West Coast is $ under both ˜ T and ˜ T . Since weights g ( t ) are increasing in taxes paid, we have g (0) < g (cid:18) (cid:19) < g (cid:18) (cid:19) . So consider the parameterized the two collections of tax policies parameterized by ε , (cid:16) ˜ T + ε ∆ T (cid:17) ε ∈ [0 , and (cid:16) ˜ T + ε ∆ T (cid:17) ε ∈ [0 , . We have Z g (cid:16) ˜ T (cid:17) ∆ T (cid:16) z i (cid:16) ˜ T (cid:17)(cid:17) d i = 13 g (cid:18) (cid:19) − g (cid:18) (cid:19) > , Z g (cid:16) ˜ T (cid:17) ∆ T (cid:16) z i (cid:16) ˜ T (cid:17)(cid:17) d i = 13 g (0) − g (cid:18) (cid:19) < . It follows from the local improvement principle that (9) that for suﬃciently small ε >

0, respec-tively, ˜ T ≻ ˜ T + ε ∆ T ˜ T ≺ ˜ T + ε ∆ T. (33)In other words, transferring a dollar from the East Coast to the West Coast is • bad at ˜ T because at ˜ T agents on the East Coast pay more in tax than those on the EastCoast; and • good at ˜ T because at ˜ T agents on the East Coast pay less in tax than those on the EastCoast. 24utting together (31), (32), and (33), we get˜ T ≺ h ˜ T + ε ∆ T i ∼ h ˜ T + ε ∆ T i ≺ ˜ T ∼ ˜ T . This establishes the desired cycle and completes the proof of Proposition 6.This shows that in general GSMWW do not provide a coherent way of ranking tax policies.Perhaps the proponent of such weights would reply that it is not the aim to provide a full rankingbut just to ﬁnd an optimum. However, I think that if the implied ranking is incoherent, then thereis no basis for trusting GSMWW’s verdict on an optimum. After all an optimum amounts to aseries of pairwise comparisons of rank with other tax policies.In the next section, I present a much more general version of the result.

This section presents my main result, which is a much more general version of the example presentedin the presented in the previous section. What I show is that whenever welfare weights do not takean essentially utilitarian form, they are not rationalizable and it is possible to construct a cycle.Thus it is essentially precisely in those cases in which the GSMWW approach is more general thanthe standard utilitarian approach that it delivers inconsistent rankings. In order to prove this resultI will make some additional assumptions about the form of the agents utility functions, but themodel will remain quite general.Suppose that we can parameterize the cost of eﬀort in terms of a one dimensional parameter φ ∈ (cid:2) φ, φ (cid:3) ⊆ R with φ < φ . A higher value of φ corresponds to a lower cost of eﬀort. Rather thanexpressing the agent’s cost of earning income as v ( z ; x, y ), we assume that we can express it as v ( z ; φ ), where v : R × (cid:2) φ, φ (cid:3) → R is a three times continuously diﬀerentiable function. I make thefollowing assumptions on v : ∀ z ∈ Z, ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∂∂z v ( z ; φ ) > | {z } ( i ) , ∂ ∂φ∂z v ( z ; φ ) < | {z } ( ii ) , ∂ ∂z v ( z, φ ) > | {z } ( iii ) ,∂∂z v (cid:0) , φ (cid:1) < | {z } ( iv ) , ∂∂z v (cid:0) ¯ z, φ (cid:1) > | {z } ( v ) , (34)Condition (i) says that earning income is costly; (ii) says that the cost function is convex for eachtype; (iii) says that earning income is less costly for higher types; (iv) says that the even the highestcost type will ﬁnd it worthwhile to earn positive income if there are no taxes; and similarly (v)says that the even the lowest cost type will only ﬁnd it worthwhile to earn less income than thetechnically highest possible level ¯ z if there are not taxes. In what follows, I sometimes write v φ ( z )instead of v ( z ; φ ). I assume a probability density f on types (cid:2) φ, φ (cid:3) such that f ( φ ) > , ∀ φ ∈ (cid:2) φ, φ (cid:3) .Similarly welfare weights are now assumed to depends on φ rather than ( x, y ), so that I write25elfare weights as g ( c, z, φ ) rather than g ( c, z ; x, y ). I also write g φ ( c, z ) = g ( c, z, φ ). I assumethat g ( c, z, φ ) is three times continuously diﬀerentiable.In this section, I assume that taxes depend only on income z and not on personal characteristics,so that we can simply write T ( z ) rather than T ( z, x ) or T ( z, φ ). This makes it harder to provemy main result; it would be easier to construct a cycle if I could condition taxes on characteristics.In this section, I also ﬁnd it convenient to restrict the class of tax policies a bit further. For each φ ∈ (cid:2) φ, φ (cid:3) , let z φ solve v ′ φ ( z φ ) = 1 (35)This z φ is the income that type φ would earn in the absence of taxes. In particular z φ is theincome that the lowest type φ would choose to earn in the absence of taxes. I restrict attention totax policies for which the marginal tax rate below z φ is 0. I also restrict attention to tax policiesfor which the marginal tax rate is never more than 1, and I assume that the set of tax policiescontains all convex tax policies with these properties. Formally, if we let T be the set of threetimes continuously diﬀerentiable functions from Z to R , then in this section, I assume that the setof admissible tax policies ˆ T has the following properties: S ∩ (cid:26) T ∈ T : d d z T (cid:0) z ′ (cid:1) ≤ (cid:27) ⊆ ˆ T ⊆ S where S = (cid:26) T ∈ T : dd z T (cid:0) z ′ (cid:1) = 0 , ∀ z ′ ∈ h , z φ i ; 0 ≤ dd z T (cid:0) z ′ (cid:1) ≤ , ∀ z ′ ∈ h z φ , ¯ z i(cid:27) . (36)Moreover, I assume that for all T ∈ ˆ T , the agent’s problem has a unique maximizer z φ ( T ) for alltypes T ; this condition holds in S . Subject to these adjustments, the assumptions on the set ˆ T of tax policies are the same as in Section 2, and likewise the deﬁnition of the set of smooth paths P holding revenue ﬁxed is the same as in Section 4. For each φ ∈ (cid:2) φ, φ (cid:3) , I deﬁne Z φ = h z φ , z φ i .Thus Z φ is the set of income levels that type φ might earn given some tax policy in ˆ T .Now, analogously to the deﬁnitions in Section 2, deﬁne z φ ( T ) ∈ arg max z ∈ Z z − T ( z ) − v φ ( z ) U φ ( T ) = u ( z φ ( T )) − T ( z φ ( T )) − v φ ( z φ ( T )) g φ ( T ) = g φ ( z φ ( T ) − T ( z φ ( T )) , z φ ( T ))Similarly for any ρ ∈ P and θ ∈ [ a, b ], deﬁne z ρφ ( θ ) = z φ (cid:16) T ρ,θ (cid:17) I now repeat the deﬁnition of rationalizability in this new setting, which is essentially identical toDeﬁnition 2, except that it incorporates types φ and the density f over types.26 eﬁnition 4 Let g be a system of welfare weights. Say that a binary relation - rationalizes g iffor all ρ ∈ P , local improvement principle: Z φφ g φ (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρ (cid:16) z ρφ (0) , θ (cid:17) f ( φ ) d φ < ⇒ h ∃ ¯ θ ∈ (0 , , ∀ θ ∈ (cid:0) , ¯ θ (cid:1) , T ρ, ≺ T ρ,θ i , Z φφ g φ (cid:0) T ρ, (cid:1) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ =0 T ρ (cid:16) z ρφ (0) , θ (cid:17) f ( φ ) d φ > ⇒ h ∃ ¯ θ ∈ (0 , , ∀ θ ∈ (cid:0) , ¯ θ (cid:1) , T ρ, ≻ T ρ,θ i , (37) indiﬀerence principle: and ∀ θ ′ ∈ [0 , , Z φφ g φ (cid:16) T ρ,θ ′ (cid:17) ∂∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ′ T ρ (cid:16) z ρφ (cid:0) θ ′ (cid:1) , θ (cid:17) f ( φ ) d φ = 0 (cid:19) ⇒ T ρ, ∼ T ρ, . (38) Say that the system of welfare weights g is rationalizable if there exists a preorder that rationalizes g . For each φ ∈ (cid:2) φ, φ (cid:3) , let z φ ∈ Z solve v ′ φ ( z φ ) = 1. Thus z φ is the income level that type φ wouldchoose in the absence of taxes. Let Deﬁnition 5

Say that g depends only on utility if and only if ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ z, z ′ ∈ Z φ , ∀ c, c ′ ∈ C , c − v ( z ; φ ) = c ′ − v (cid:0) z ′ , φ (cid:1) ⇒ g φ ( c, z ) = g φ (cid:0) c ′ , z ′ (cid:1) (39)To understand this deﬁnition, observe that c − v ( z ; φ ) is a representation of the agent’s utility. So(39) says that the welfare weight of agent of type φ depends only on the utility of type φ ; it meansthat in two allocations ( z, c ) and ( z ′ , c ′ ) in which type φ has the same utility, type φ gets the samewelfare weight. It also means that in two allocations ( z, c ) and ( z ′ , c ′ ) in which type φ has the samemarginal utility for consumption for every possible choice of the function u , type φ has the samewelfare weight. Observe that (39) is satisﬁed in the standard utilitarian case, where g φ ( c, z ) = u ′ ( c − v ( z, φ )) . It might be justiﬁed to call welfare weights g that depend only on utility utilitarian , becausewhenever weights depend only on utility, there exists type-dependent utility functions u φ : R → R and w φ : R × Z → R such that w φ ( c, z ) = u φ ( c − v φ ( z )) If we restrict attention to strictly convex u , then we can change the word “every” to “any” in the above statement. g φ = ∂w φ ∂c ;and, moreover, that weights g φ are those that are induced by the social welfare function W = R w φ f ( φ ) d φ . So such weights can be thought of as coming from a utilitarian social welfarefunction. However, one might instead prefer to call weights that depend only on utility in thesense of Deﬁnition 5 quasi-utilitarian , rather than utilitarian, in that the Deﬁnition 5 does notpreclude the possibility the implicit choice of the function u φ can depend on non-utilitarian ethicalconsiderations.This terminological and interpretive choice resonates with two comments from Fleurbaey andManiquet (2018), which concern their social welfare function approach (rather than the local socialwelfare weight approach); Fleurbaey and Maniquet write, “Our defense of the social welfare func-tion could even be understood as a defense of the utilitarian approach, for an ecumenical notion ofutilitarianism that is ﬂexible about the degree of inequality aversion and the deﬁnition of individualutility.” (p. 1031) This ecumenical sense of utilitarianism supports the terminology “utilitarian”for welfare weights that satisfy (39). However Fleurbaey and Maniquet also write “... we highlightanother way in which at least some fairness principles can remain compatible with the Pareto prin-ciple. Fairness principles can indeed guide the selection of the individual utility functions that serveto measure well-being and perform interpersonal comparisons.” (p. 1040) That is, non-utilitarian“fairness” considerations can guide the choice of weights g φ compatible with (39); Fleurbaey andManiquet provide a number of examples; other examples of this principle include Piacquadio (2017)and Berg and Piacquadio (2020). The “utilitarian” social welfare function constructed in Section5 also has this ﬂavor. From this perspective, the term “quasi-utilitarian” may be better (althoughnote that Piacquadio (2017) refers to his approach as “a fairness justiﬁcation of utilitarianism ”).Whatever the appropriate way of interpreting condition (39), it is the critical condition for therationalizability of welfare weights. Theorem 1

Under the assumptions of this section, g depends only on utility if and only if g isrationalizable. So the theorem shows that, outside of the essentially utilitarian case, welfare weights are notrationalizable. Outside of the utilitarian case, the proof of Theorem 1 shows that if the binaryrelation - rationalizes g , it is possible to construct a sequence of tax policies T , T , T , T , whichgenerate the same revenue, such that they generate a cycle: T ∼ T ≺ T ∼ T ≺ T . One might wonder how we can always attain a separable social welfare function whenever welfare weights arerationalizable; the answer is that type θ ’s welfare weight is assumed to depend only on type θ ’s consumption, income,and characteristics, and not on the distribution of these in society. Remark 2

The speciﬁc condition that welfare weights depend only on utility depends on the factthat we have assumed that utility takes the quasilinear form u ( c − v ( z )) , so that there are no incomeeﬀects. In a more in which utility took the more general form u ( c, z ) , then the statement of thetheorem would have to be modiﬁed, and the necessary and suﬃcient condition for rationalizabilitywould no longer be (39). I believe that in that case an analogous result would hold: that outside ofan essentially “utilitarain case”, welfare weights would not be rationalizable. However, this is notsomething I have yet proven. I conclude this paper with some thoughts about how broader values ought to be represented.Consider libertarianism as an example. Suppose that one thinks that people are entitled totheir pre-tax incomes and in some way taxation is like theft. This view is not faithfully renderedas saying that additional income to people people who have been taxed more should be givenadditional weight in comparison to those who have been given less; rather it is the view that it iswrong to tax, or at least, if not absolutely wrong, that it is bad to tax, and that this bad is tolerated,to the extent that it is, because of the other important purposes of taxation. On a rights-basedversion of libertarianism, taxing people is bad not because it reduces their utility but because itviolates their entitlements. Imagine there is a function c i ( t i ) for each agent i , that measures howbad it is to violate i ’s entitlements. Then the social objective is given by W ( T ) = − Z i c i ( T i ( z i ( T ))) d i. (40)Note that c i ( t i ) is not derived from agents’ preferences but rather measures a social judgementabout how bad it is to violate the person’s entitlements. One could also consider a more pluralistvalue function, which incorporates both utilitarian and libertarian considerations. W ( T ) = α Z i U i ( T ) d i − (1 − α ) Z i c i ( T i ( z i ( T ))) d i, (41)where α measures the weight put on utilitarianism rather than libertarianism. α = 0 correspondsto pure libertarianism (as in (40)), α = 1 to pure utilitarianism. If Ann and Bob respectively make For diﬀerent approaches to libertarian taxation, see Nozick (1974), Feldstein (1976), Young (1987), Weinzierl(2014), and Vallentyne (2018). α and α with α < α , that means that Ann places moreweight on libertarian considerations than Bob (relative to utilitarian considerations). One couldthen maximize (41) subject to a revenue requirement. The analog to the local desirability of a taxreform ∆ T (13) with objective (41) is Z i (cid:8)(cid:2) αg i ( T ( z i ( T ))) + (1 − α ) c ′ i ( T i ( z i ( T ))) (cid:3) ∆ T i ( z i ( T ))+ (1 − α ) c ′ i ( T i ( z i ( T ))) T ′ i ( z i ( T )) dd ε (cid:12)(cid:12)(cid:12)(cid:12) ε =0 z i ( T + ε ∆ T ) (cid:27) d i < , where g i ( T ) = u ′ i ( z i ( T ) − T ( z i ( T )) − v i ( z i ( T )))is the utilitarian welfare weight. It follows that under this system, changes in tax policies are notevaluated by just putting a weight on changes in taxes ∆ T . There is no system of welfare weightsthat is equivalent to the social welfare function (41). At the same time, judgments rendered by(41) must be consistent and rationalizable because they come from a global objective.One potential criticism of the objective (41) is that it will lead to Pareto ineﬃcient optima.However, the reply to this is that that is what is involved in taking rights and other values seriously:These other values will compete with preference satisfaction, and one may want to sacriﬁce somemeasure of preference satisfaction to realize these other values. This point was made by Sen(1979). Now some economists may view such violations of Pareto as unacceptable, and argue thatany change that improves everyone’s preferences must be deemed as a social improvement. Thatmay be true (or not), but in making such a claim, innocuous as it may sound, economists becomemoral philosophers and it is then incumbent on them to provide a philosophical defense. Papersthat criticize the Pareto principle include Mongin (1997/2016) and Sher (2020).There is another potential alternative way that broader values may enter into economic analysissuggested by the above analysis. Theorem 1 shows that welfare weights are rationalizable providedthey take the form g φ ( c, z ) = ˜ g φ ( c − v φ ( z )). g φ may be marginal utility for consumption butit need not be. Perhaps the dependence of welfare weights of this form on characteristics φ canallow us to incorporate some broader ethical values, while preserving Pareto. This possibility wasdiscussed above in Sections 3, 5, and 7, and it is a possible way of incorporating broader values,but I do not think that we should expect it to succeed in capturing all values. Again thinkingof the libertarian case, if we are thinking about violations of people’s entitlements, it does notintuitively seem that what is at stake is just how much weight we should put on diﬀerent people’sincome; likewise, if we think about merit, fairness, or procedural values, it is not clear why theirrepresentation should be of this form. To fully address these issues would take us beyond the scopeof the current paper and would entail considering other literatures. However, the consideration ofwhat can and cannot be done with welfare weights puts these issues in sharper focus.30 eferences Berg, K. and Piacquadio, P. G. (2020), ‘The equal-sacriﬁce social welfare function with an appli-cation to optimal income taxation’.Bergson, A. (1938), ‘A reformulation of certain aspects of welfare economics’,

The Quarterly Journalof Economics (2), 310–334.Bossert, W. and Weymark, J. A. (2004), Utility in social choice, in ‘Handbook of utility theory’,Kluwer Academic Publishers, pp. 1099–1177.d’Aspremont, C. and Gevers, L. (1977), ‘Equity and the informational basis of collective choice’, The Review of Economic Studies (2), 199–209.Feldstein, M. (1976), ‘On the theory of tax reform’, Journal of public economics (1-2), 77–104.Fleurbaey, M. and Maniquet, F. (2011), A theory of fairness and social welfare , Vol. 48, CambridgeUniversity Press.Fleurbaey, M. and Maniquet, F. (2018), ‘Optimal income taxation theory and principles of fairness’,

Journal of Economic Literature (3), 1029–79.Fleurbaey, M., Tungodden, B. and Chang, H. F. (2003), ‘Any non-welfarist method of policy assess-ment violates the pareto principle: A comment’, Journal of Political Economy (6), 1382–1385.Hammond, P. J. (1979), ‘Equity in two person situations: some consequences’,

Econometrica pp. 1127–1135.Hartman, P. (1987),

Ordinary Diﬀerential Equations , SIAM.Kaplow, L. and Shavell, S. (2001), ‘Any non-welfarist method of policy assessment violates thepareto principle’,

Journal of Political Economy (2), 281–286.Lang, S. (2012),

Real and functional analysis , Springer.Mongin, P. (1997/2016), ‘Spurious unanimity and the pareto principle’,

Economics & Philosophy (3), 511–532.Nozick, R. (1974), Anarchy, state, and utopia , Vol. 5038, New York: Basic Books.Piacquadio, P. G. (2017), ‘A fairness justiﬁcation of utilitarianism’,

Econometrica (4), 1261–1276.Roberts, K. W. (1980), ‘Interpersonal comparability and social choice theory’, The Review of Eco-nomic Studies pp. 421–439.Saez, E. (2001), ‘Using elasticities to derive optimal income tax rates’,

The review of economicstudies (1), 205–229. 31aez, E. and Stantcheva, S. (2016), ‘Generalized social marginal welfare weights for optimal taxtheory’, American Economic Review (1), 24–45.Samuelson, P. A. (1947), ‘Foundations of economic analysis’.Sen, A. (1970), ‘The impossibility of a paretian liberal’,

The Journal of Political Economy (1), pp.152–157.Sen, A. (1977), ‘On weights and measures: informational constraints in social welfare analysis’, Econometrica pp. 1539–1572.Sen, A. (1979), ‘Utilitarianism and welfarism’,

The Journal of Philosophy (9), 463–489.Sen, A. (1986), ‘Social choice theory’, Handbook of mathematical economics , 1073–1181.Sher, I. (2020), ‘How perspective-based aggregation undermines the pareto principle’, Politics,Philosophy & Economics (2), 182–205.Vallentyne, P. (2018), Libertarianism and taxation, in M. O’Neill and S. Orr, eds, ‘Taxation:philosophical perspectives’, Oxford University Press, pp. 98–110.Weinzierl, M. (2014), ‘The promise of positive optimal taxation: normative diversity and a role forequal sacriﬁce’,

Journal of public Economics , 128–142.Weymark, J. A. (2016), Social welfare functions, in M. D. Adler and M. Fleurbaey, eds, ‘The OxfordHandbook of Well-Being and Public Policy’, Oxford University Press, pp. 126–159.Weymark, J. A. (2017), ‘Conundrums for nonconsequentialists’,

Social Choice and Welfare (2), 269–294.Young, H. P. (1987), ‘Progressive taxation and the equal sacriﬁce principle’, Journal of publicEconomics (2), 203–214. 32 AppendixA.1 Preliminaries

Consider welfare weights of the form g φ ( c, z ). Deﬁne the variable u by u = c − v φ ( z ). So c = u + v φ ( z ). Then wecan re-express welfare weights in terms of u and z , rather than in terms of c and z as follows.ˆ g φ ( u, z ) = g φ ( u + v φ ( z ) , z ) , ∀ u ∈ R , ∀ z ∈ Z. (A.1)Formally, welfare weights can then be represented as a function ˆ g : R × Z × (cid:2) φ, φ (cid:3) → R + where I often write ˆ g φ ( u, z )instead of ˆ g ( u, z, φ ).Notice that the two formulations–in terms of u and z and in terms of c and z –are equivalent because u isrecoverable from c, z, and φ , and c is recoverable from u, z and φ . Proposition A.1

Let g and ˆ g be related as in (A.1). Then the following conditions are equivalent: g depends only on utility.2. ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ u ∈ R , ∀ z, z ′ ∈ Z φ , ˆ g φ ( u, z ) = ˆ g φ ( u, z ′ ) . ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ u ∈ R , ∀ z ∈ Z φ , ∂∂z ˆ g φ ( u, z ) = 0 . Proof. First I argue that condition 1 of the proposition implies condition 2. First that g depends only on utility.Now choose φ ∈ (cid:2) φ, φ (cid:3) , z, z ′ ∈ Z φ , and u ∈ R . Deﬁne c = u + v i ( z ) and c ′ = u + v i ( z ′ ). Then observe that c − v i ( z ) = u = c ′ − v i ( z ′ ). So by (A.1), ˆ g φ ( u, z ) = g φ ( c, z ) = g φ ( c ′ , z ′ ) = ˆ g φ ( u, z ′ ), where the middle equalityfollows from the assumption that g depends only on utility. It follows that condition 2 of the proposition holds.Next I argue that condition 2 implies condition 1. So assume condition 2. Choose φ ∈ (cid:2) φ, φ (cid:3) , c, c ′ ∈ R , and z, z ′ ∈ Z φ and assume that u = c − v ( z ; φ ) = c ′ − v ( z ′ ; φ ) = u ′ . It follows that g φ ( c, z ) = ˆ g φ ( u, z ) = ˆ g φ ( u, z ′ ) =ˆ g φ ( u ′ , z ′ ) = g φ ( c ′ , z ′ ), where the second equality follows from condition 2. This establishes condition 1.Finally, consider the equivalence of conditions 2 and 3. First observe that by continuous diﬀerentiability con-dition 2 implies: ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ u ∈ R , ∀ z ∈ Z φ , ∂∂z ˆ g φ ( u, z ) = 0 . The equivalence now follows from the fundamentaltheorem of calculus. (cid:3)

Given the proposition, observe that g φ ( T ) = g φ ( U φ ( T ) , z φ ( T )) (A.2)The Proposition has the following corollary: Corollary A.1 If g does not depend only on utility, then there exists three times continuously diﬀerentiable taxpolicy T ( z ) in ˆ T , which is strictly increasing and strictly convex on h z φ , ¯ z i , and is such that either ∀ φ ∈ ( φ , φ ) , ∂∂z ˆ g φ ( U φ ( T ) , z φ ( T )) < or ∀ φ ∈ ( φ , φ ) , ∂∂z ˆ g φ ( U φ ( T ) , z φ ( T )) > Recall that we have assumed that g ( c, z ; φ ) is continuously diﬀerentiable in ( c, z ; φ ). A.0 roof. Assume that g does not depends on utility. It follows from Proposition A.1 that there exists φ ◦ ∈ (cid:0) φ, φ (cid:1) , z ◦ ∈ (cid:16) z φ , z φ ◦ (cid:17) , u ◦ ∈ R and such that ∂∂z ˆ g φ ◦ ( u ◦ , z ◦ ) = 0 . (A.5)It follows from (34) that 1 − v ′ φ ◦ ( z ◦ ) >

0. Let r ◦ = 1 − v ′ φ ◦ ( z ◦ ). Then observe that z ◦ is the unique maximizerof z (1 − r ◦ ) − v φ ◦ ( z ) over Z , where this follows form the fact that the objective in this optimization is strictlyconcave. It follows that if we choose any three times continuously diﬀerentiable tax policy T in ˆ T that is strictlyconvex on h z φ , ¯ z i and such that T ′ ( z ◦ ) = r ◦ , then z φ ◦ ( T ) = z ◦ . Observe that strict convexity of T on h z φ , ¯ z i together with the fact that since T ∈ ˆ T , ∂∂z T (cid:16) z φ (cid:17) = 0 (see (36)) imply that T is strictly increasing on h z φ , ¯ z i . Byadding a lumpsum transfer, we can insure that U φ ( T ) = u ◦ . (A.5) now implies that either (A.3) or (A.4) holds. (cid:3) Let P be the set of all paths ρ : [ a, b ] → ˆ T (where recall a < , < b ) that satisfy condition i at the beginningof Section 4 but not necessarily condition iii; note that because, in Section 7, we assume that there are no observablecharacteristics, condition ii is vacuously satisﬁed. Observe that that the diﬀerence between paths P and P isthat paths ρ in P hold revenue ﬁxed, whereas paths in P need not satisfy this requirement. Thus P ⊂ P . Deﬁnition 6

Say that a binary relation - on ˆ T strongly rationalizes g if the local improvement principle (37)and the indiﬀerence principle (38) are satisﬁed for all paths ρ in P , and g is strongly rationalizable if thereexists a preorder - that strongly rationalizes g . Strong rationalizability diﬀers from rationalizability (as in Deﬁnition 4) only in that in the former P is substitutedfor P ; or in other words, that the conditions (37) and (38) are required to hold on all smooth paths, not just thoseon which revenue is held constant. It follows that if g is strongly rationalizable, then g is rationalizable, but theconverse does not hold in general. By contraposition, if g is not rationalizable. A.2 The core of the proof of the main theorem

I ﬁrst prove a weaker version of one direction of the theorem.

Proposition A.2

Under the assumptions of section 7, if g does not depend only on utility then g is not stronglyrationalizable. This proposition is weaker than the main theorem in two ways: ﬁrst, it is only one direction–the harder direction–and also since it employs strong rationalizability rather than rationalizability, it does not require that revenueis held constant along a path. Below, in Proposition A.3, I will strengthen the result to hold revenue constant.However, I separate that from the rest of the argument because it introduces complexity but does not really engagewith the main ideas in the proof–in the example of Section 6, holding revenue constant was accomplished simplyby varying the revenue in one part of the tax schedule to oﬀset changes in another part; the same sort of idea workshere.So to prove Proposition A.5, I assume that g does not depend only on utility, and I argue that g is not stronglyrationalizable. Appealing to Corollary A.1, I consider the case in which there exists φ < φ and three timescontinuously diﬀerentiable tax policy T in ˆ T , which is strictly increasing and strictly convex on h z φ , ¯ z i , and suchthat (A.3) holds; the case in which (A.4) instead of (A.3) holds is similar. Also, in condition (i), T ρi ( z i , θ ) = T ρ ( z i , θ ) ∀ i . A.1 et z = z φ ( T ) , z = z φ ( T ), so that z < z . Now consider a three times continuously diﬀerentiable real-valued function ν ( z ) with with domain [0 , ¯ z ] and support [ z , ¯ z ] and such that its derivative ν ′ has support [ z , z ]and ν ′ ( z ) > , ∀ z ∈ ( z , z ). For each γ ∈ R , deﬁne the tax policy T γ = T + γν. Observe that T γ is three times continuously diﬀerentiable and, if | γ | is suﬃciently small, then T γ is strictly increasingand strictly convex on h z φ , ¯ z i .Next choose ˆ φ , ˆ φ ∈ (cid:0) φ, φ (cid:1) with ˆ φ < ˆ φ < φ and let ˆ z = z ˆ φ ( T ) , ˆ z = z ˆ φ ( T ). It follows that ˆ z < ˆ z < z .Let µ ( z ) be a three times continuously diﬀerentiable function with domain [0 , ¯ z ] and support [ˆ z , ¯ z ] and such that µ ′ has support [ˆ z , ˆ z ] with µ ′ ( z ) > z , ˆ z ) and µ ( z ) = 1 on [ˆ z , ¯ z ].It follows from the Picard-Lindel¨of theorem that there exists an interval [ − γ ∗ , γ ∗ ] (with γ ∗ >

0) and a uniquefunction ζ ( γ ) : [ − γ ∗ , γ ∗ ] → R such that ζ (0) = 0 and Z φφ g φ ( T γ − ζ ( γ ) µ ) (cid:20) − dd γ ζ ( γ ) µ ( z φ ( T γ − ζ ( γ ) µ )) + ν ( z φ ( T γ − ζ ( γ ) µ )) (cid:21) f ( φ ) d φ = 0 . (A.6)To interpret this expression, note that T γ − ζ ( γ ) µ is the tax policy deﬁned by [ T γ − ζ ( γ ) µ ] ( z ) = T γ ( z ) − ζ ( γ ) µ ( z ) , ∀ z ∈ [0 , ∞ ). Note that for suﬃciently small γ , T γ − ζ ( γ ) µ is strictly increasing and strictly convex on h z φ , ¯ z i . We may assume that γ ∗ is suﬃciently small that for all γ ∈ [ − γ ∗ , γ ∗ ], T γ and T γ − ζ ( γ ) µ are strictlyincreasing and strictly convex on h z φ , ¯ z i .We can rewrite (A.6) as dd γ ζ ( γ ) = R φφ g φ ( T γ − ζ ( γ ) µ ) ν ( z φ ( T γ − ζ ( γ ) µ )) f ( φ ) d φ R φφ g φ ( T γ − ζ ( γ ) µ ) µ ( z φ ( T γ − ζ ( γ ) µ )) f ( φ ) d φ . In particular, observe that if we deﬁne F ( γ, ζ ) : R → R by F ( γ, ζ ) = R φφ g φ ( T γ − ζµ ) ν ( z φ ( T γ − ζµ )) f ( φ ) d φ R φφ g φ ( T γ − ζµ ) µ ( z φ ( T γ − ζµ )) f ( φ ) d φ , then F is continuously diﬀerentiable in ( γ, ζ ) in a neighborhood N of ( γ, ζ ) = (0 , N , F is Lipschitz continuous. A.2 sing the fact that when γ = 0, T γ − ζ ( γ ) µ = T , we havedd γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 ζ ( γ ) = R φφ g φ ( T ) ν ( z φ ( T )) f ( φ ) d φ R φφ g φ ( T ) µ ( z φ ( T )) f ( φ ) d φ = "Z φ φ g φ ( T ) ν ( z ) f ( φ ) d φ + Z φ φ g φ ( T ) ν ( z φ ( T )) f ( φ ) d φ + Z φφ g φ ( T ) ν ( z ) f ( φ ) d φ φφ g φ ( T ) µ ( z φ ( T )) f ( φ ) d φ = "Z φ φ g φ ( T ) ν ( z ) µ ( z φ ( T )) f ( φ ) d φ + Z φ φ g φ ( T ) ν ( z φ ( T )) µ ( z φ ( T )) f ( φ ) d φ + Z φφ g φ ( T ) ν ( z ) µ ( z φ ( T )) f ( φ ) d φ φφ g φ ( T ) µ ( z φ ( T )) f ( φ ) d φ (A.7)To understand the second equality, observe that by our assumptions on ν , ν ( z ) = 0 , ∀ z ∈ [0 , z ]. This explains theﬁrst integral in the numerator of the right-hand side. Similarly, our assumptions imply that for all z ∈ [ z , ¯ z ] , ν ( z ) = ν ( z ), from which it follows that for all φ ∈ (cid:2) φ , φ (cid:3) , ν ( z φ ( T γ )) = ν ( z ); this explains the third integral of thenumerator of the right-hand side of the second equality. To understand the third equality, observe that ν ( z ) = 0,so adding µ ( z φ ( T )) to the ﬁrst integral in the numerator has no eﬀect; also when φ > φ , z φ ( T ) > z > ˆ z , so µ ( z φ ( T )) = 1, which justiﬁes adding µ ( z φ ( T )) to the second and third integrals in the numerator in the thirdequality.Observe moreover that it follows from the facts that (i) { ν ( z φ ( T )) : φ ∈ ( φ , φ ) } = ( ν ( z ) , ν ( z )) , (ii) ν is strictly increasing on [ z , z ], and (iii) g φ ( T ) µ ( z φ ( T )) f ( φ ) ≥

0, for all φ ∈ (cid:0) φ, φ (cid:1) and ∃ φ ′ < φ such thatfor all φ ∈ (cid:0) φ ′ , φ (cid:1) , g φ ( T ) µ ( z φ ( T )) f ( φ ) >

0, and (A.7) that there exists ˆ φ ∈ ( φ , φ ) such thatdd γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 ζ ( γ ) = ν (cid:16) z ˆ φ ( T ) (cid:17) . (A.8)Let ˆ z = z ˆ φ ( T ).Now consider two three times continuously diﬀerentiable functions η and η such that η has support [ z ′ , z ′ ], η has support [ z ′′ , z ′′ ], where z ′ , z ′ , z ′′ , z ′′ ∈ R + and z ′ < z ′ < ˆ z < z < z ′′ < ˆ z < z ′′ < z and there exist φ ′ , φ ′ , φ ′′ , φ ′′ ∈ (cid:0) φ, φ (cid:1) such that z ′ = z φ ′ ( T ) , z ′ = z φ ′ ( T ) , z ′′ = z φ ′′ ( T ), and z ′′ = z φ ′′ ( T ); it follows that z φ < z ′ . I also assume that η ( z ) < , ∀ z ∈ ( z ′′ , z ′′ ) , η ′ ( z ) < , ∀ z ∈ ( z ′′ , ˆ z ), and η ′ ( z ) > , ∀ z ∈ (ˆ z, z ′′ ), and Z φ ′ φ ′ g φ ( T ) η ( z φ ( T )) f ( φ ) d φ = − Z φ ′′ φ ′′ g φ ( T ) η ( z φ ( T )) f ( φ ) d φ. (A.9)Now consider the parameterized collection of tax policies T γ,ε,λ = T + γν + ε ( λη + η ) (A.10) A.3 ext observe that by the Picard-Lindel¨of theorem, there exist δ ∗ , ε ∗ > ζ ( γ, ε, λ ) : [ − γ ∗ , γ ∗ ] × [ − ε ∗ , ε ∗ ] × [1 − δ ∗ , δ ∗ ] → R to solve the equations ζ (0 , ε, λ ) = 0 , ∀ ε, ∀ λ (A.11) Z φφ g φ ( T γ,ε,λ − ζ ( γ, ε, λ ) µ ) × (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ ( z φ ( T γ,ε,λ − ζ ( γ, ε, λ ) µ )) + ν ( z φ ( T γ,ε,λ − ζ ( γ, ε, λ ) µ )) (cid:21) f ( φ ) d φ = 0 . (A.12)Note that because (A.12) reduces to (A.6) when ε = 0, the uniqueness of solutions implied by the Picard-Lindel¨oftheorem imply that ζ ( γ ) = ζ ( γ, , λ ) , ∀ λ ∈ [1 − δ ∗ , δ ∗ ] . Moreover the smoothness of the primitive functions

T, ν, µ, η , η imply that ζ ( ε, γ, λ ) is twice continuously diﬀer-entiable. Deﬁne ˆ T γ,ε,λ = T γ,ε,λ − ζ ( γ, ε, λ ) µ. (A.13)Now consider the optimization problem max z z − ˆ T γ,ε,λ − v φ ( z ) . Extending the construction in footnote 21, we can write F ε,λ ( γ, ζ ) = R φφ g φ ( T γ,ε,λ − ζµ ) ν ( z φ ( T γ,ε,λ − ζµ )) f ( φ ) d φ R φφ g φ ( T γ,ε,λ − ζµ ) µ ( z φ ( T γ,ε,λ − ζµ )) f ( φ ) d φ If follows from the Picard-Lindel¨of theorem that for each ( ε, λ ) in a neighborhood N of (0 , γ ε,λ > ζ ε,λ ( γ ) : [ − γ ε,λ , γ ε,λ ] → R satisﬁes dd γ ζ ε,λ ( γ ) = F ( γ, ζ ε,λ ( γ )) for all γ ∈ [ − γ ε,λ , γ ε,λ ]. How large we canchoose γ ε,λ depends on the range over which F ε,λ is Lipschitz continuous in γ for each value of ζ and the supremum of F ε,λ over the relevant range. Because F ε,λ varies smoothly in ( ε, λ ) we can choose γ ε,λ that is independent of ε and λ within therelevant range; that is, we can choose γ ∗ such that γ ∗ = γ ε,λ , ∀ ε, λ ∈ [ − ε ∗ , ε ∗ ] × [1 − δ ∗ , δ ∗ ]. In particular, the function ζ is implicitly deﬁned by a diﬀerential equation of the form ζ ′ = H ( ζ ; ε, γ, λ ) . It follows from Corollary 4.1 on p. 101 of Hartman (1987) that to show that ζ ( ε, γ, λ ) is twice continuously diﬀerentiable,it is suﬃcient to show that H is twice continuously diﬀerentiable. Moreover, in our case H can be written in the form H ( ζ ; ε, γ, λ ) = R φφ h φ ( z φ ( ε, γ, λ, ζ ) ; ε, γ, λ, ζ ) d φ R φφ k φ ( z φ ( ε, γ, λ, ζ ) ; ε, γ, λ, ζ ) d φ , where z φ ( ε, γ, λ, ζ ) is the optimal solution to max z z − T γ,ε,λ ( z ) + ζµ ( z ) − v φ ( z ) , and h φ and k φ are twice continuouslydiﬀerentiable functions–where this follows from the smoothness of T, ν, µ, η , and η . Note next that z φ ( ε, γ, λ, ζ ) is deﬁnedas an implicit function by the ﬁrst order condition to the above-mentioned optimization problem, which is of the form0 = L ( z ; ε, γ, λ, ζ ) , where the smoothness of L , which again follows from the smoothness of T, ν, µ, η , and η implies that z φ ( ε, γ, λ, ζ ) istwice continuously diﬀerentiable (see Theorem 2.1 on p. 364 of Lang (2012)) These facts together imply that H is twicecontinuously diﬀerentiable, which, in turn, implies that ζ ( ε, γ, λ ) is twice continuously diﬀerentiable as explained above. A.4 he optimal solution z φ (cid:16) ˆ T γ,ε,λ (cid:17) satisﬁes the ﬁrst order condition:0 =1 − T ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − v ′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − γν ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − ε (cid:16) λη ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:17) + ζ ( γ, ε, λ ) µ ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (A.14)Applying the implicit function theorem to (A.14) when γ = ε = 0, we derivedd γ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 z φ (cid:16) ˆ T γ,ε,λ (cid:17) = − ν ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + h dd γ ζ ( γ, ε, λ ) i µ ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 , (A.15)dd ε (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 z φ (cid:16) ˆ T γ,ε,λ (cid:17) = − λη ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 , (A.16)where (A.16) uses the fact that dd ε ζ ( γ, ε, λ ) (cid:12)(cid:12) γ =0 = 0, which follows from (A.11). Observe that when φ ∈ ( φ ′′ , φ ′′ ),and γ = ε = 0, µ ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) = 0 and η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) = 0. So, in that case, (A.15)-(A.16) simplify todd γ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 z φ (cid:16) ˆ T γ,ε,λ (cid:17) = − ν ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 , ∀ φ ∈ ( φ ′′ , φ ′′ ) . (A.17)dd ε (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 z φ (cid:16) ˆ T γ,ε,λ (cid:17) = − η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 , ∀ φ ∈ ( φ ′′ , φ ′′ ) . (A.18)It follows from (A.17) and (A.18) thatdd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 = dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 , ∀ φ ∈ ( φ ′′ , φ ′′ ) . (A.19)Diﬀerentiating (A.12) with respect to ε , it follows thatdd ε Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ = 0 . Applying Leibniz’s integral rule and the product rule, we have Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ + Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂ ∂ε∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + (cid:20) − ∂∂γ ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:21) f ( φ ) d φ = 0 . A.5 earranging terms, Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) ∂ ∂ε∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ = Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ + Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (A.20)Write ˆ T γ,ε,λ ( z ) = ˆ T ( z ; γ, ε, λ ) and observe that (explanations for the derivation at the bottom): ∂∂γ Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) ∂∂ε ˆ T (˜ z φ ; γ, ε, λ ) (cid:21) ˜ z φ = z φ ( ˆ T γ,ε,λ ) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 = ∂∂γ Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18) λη (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − ∂∂ε ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:19) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.21)= Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18) λη (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − ∂∂ε ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:19) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18)(cid:20) λ dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) − ∂∂ε ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) × dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17) − ∂ ∂ε∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:19) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.22)= Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:16) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18) dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17) − ∂ ∂ε∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:19) f ( φ ) dφ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.23)= Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18) dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:19) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) ∂∂γ ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.24) A.6 Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φ ′′ φ ′′ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:18) dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:19) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φ ′′ φ ′′ g φ (cid:16) ˆ T γ,ε,λ (cid:17) dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) ∂∂γ ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.25)= Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) ∂∂γ ζ ( γ, ε, λ ) dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.26)= Z φφ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φφ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.27)= Z φ ′′ φ ′′ ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φ ′′ φ ′′ dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.28)= Z φ ′′ φ ′′ (cid:26) ∂∂u ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:20) ∂∂γ ζ ( γ, ε, λ ) − ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) + ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:27) × η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φ ′′ φ ′′ (cid:26) ∂∂u ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) (cid:20) dd ε ζ ( γ, ε, λ ) − η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) + ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:27) × (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.29)= Z φ ′′ φ ′′ + ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17) × η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 − Z φ ′′ φ ′′ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.30) A.7 − Z φ ′′ φ ′′ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) ν ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φ ′′ φ ′′ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) × (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.31)= − Z φ ′′ φ ′′ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } − ν ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } + η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } − f ( φ ) | {z } + d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z ˆ φφ ′′ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } − η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } − × (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21)| {z } − f ( φ ) | {z } + d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 + Z φ ′′ ˆ φ ∂∂z ˆ g φ (cid:16) U φ (cid:16) ˆ T γ,ε,λ (cid:17) , z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } − η ′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) T ′′ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) + v ′′ φ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)| {z } + × (cid:20) − ∂∂γ ζ ( γ, ε, λ ) + ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:21)| {z } + f ( φ ) | {z } + d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 (A.32) < ζ , which follows from the twice continuous diﬀerentiability of ζ ,which was established above; (A.23) follows from the fact that by (A.11), ∂∂ε ζ ( γ, ε, λ ) = 0 when γ = 0, thefact that ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 outside of h ˆ φ , φ i because neither marginal taxes nor total taxes are chang-ing locally outside of that interval as γ varies, and the fact that dd γ z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 only when φ ∈ [ φ , φ ], η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 only when φ ∈ [ φ ′ , φ ′ ], and [ φ , φ ] ∩ [ φ ′ , φ ′ ] = ∅ ; (A.24) follows from (A.20); (A.26) fol-lows from the fact that dd z η (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 and dd z ν (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 equal zero outside of[ φ ′′ , φ ′′ ]; (A.26) follows from (A.19); (A.27) from the fact that dd z µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 only when φ ∈ (cid:16) ˆ φ , ˆ φ (cid:17) ,by (A.16), dd ε z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 only when φ ∈ ( φ ′ , φ ′ ) ∪ ( φ ′′ , φ ′′ ) and (cid:16) ˆ φ , ˆ φ (cid:17) and ( φ ′ , φ ′ ) ∪ ( φ ′′ , φ ′′ ) are dis-joint; (A.28) follows from the fact that dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:12)(cid:12)(cid:12) γ = ε =0 = 0 outside of ( φ ′ , φ ′ ) ∪ ( φ ′′ , φ ′′ ), µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) = 0 A.8 n ( φ ′ , φ ′ ) and µ (cid:16) z φ (cid:16) ˆ T γ,ε,λ (cid:17)(cid:17) = 1 on ( φ ′′ , φ ′′ ) . (A.29) follows from expanding the terms ∂∂γ g φ (cid:16) ˆ T γ,ε,λ (cid:17) and dd ε g φ (cid:16) ˆ T γ,ε,λ (cid:17) using (A.2), and appealing to the envelope theorem; (A.30) follows from the fact that ∂∂ε ζ ( γ, ε, λ ) = 0when γ = 0 by (A.11), and the fact that once this term is eliminated, other terms cancel out; (A.31) follows from(A.17)-(A.18); where in (A.32), I have signed terms on the basis of assumptions that I have made above, including(A.3), assumptions above on ν and η , the convexity of T and v φ , the fact that, by (34), z φ (cid:16) ˆ T γ,ε,λ (cid:17) is increasingin φ , and (A.8); and ﬁnally the ﬁnal inequality (A.33) follows from keeping track of the preceding signs.To summarize, above we have established that for all λ ∈ [1 − δ ∗ , δ ∗ ] ∂∂γ Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) ∂∂ε ˆ T (˜ z φ ; γ, ε, λ ) (cid:21) ˜ z φ = z φ ( ˆ T γ,ε,λ ) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 < . (A.34)Next observe that it follows from (A.9) and our assumptions on η and η that  Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) ∂∂ε ˆ T (˜ z φ ; γ, ε, λ ) (cid:21) ˜ z φ = z φ ( ˆ T γ,ε,λ ) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ = ε =0 ,λ =1  > = <   ⇔ λ  > = <  - that strongly rationalizes g . It follows fromthe local improvement principle (37) that there exists ¯ ε > ε ∈ [0 , ¯ ε ], λ > ⇒ T ≻ T + ε ( λη + η ) , (A.36) λ < ⇒ T ≺ T + ε ( λη + η ) . (A.37)It follows from (A.34) and (A.35) that if γ > Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) ∂∂ε ˆ T (˜ z φ ; γ, ε, λ ) (cid:21) ˜ z φ = z φ ( ˆ T γ,ε,λ ) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 ,λ =1 < . By continuity of the above integral in λ , for ﬁxed small γ >

0, there exists δ γ such that for all λ ∈ [1 − δ γ , δ γ ], Z φφ g φ (cid:16) ˆ T γ,ε,λ (cid:17) (cid:20) ∂∂ε ˆ T (˜ z φ ; γ, ε, λ ) (cid:21) ˜ z φ = z φ ( ˆ T γ,ε,λ ) f ( φ ) d φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 < . (A.38)It follows from the local improvement principle (37) that for suﬃciently small γ > λ ∈ [1 − δ γ , δ γ ], thereexists ε γ,λ ∈ (0 , ¯ ε ), such that T + γν − ζ ( γ, , λ ) µ ≺ T + γν + ε γ,λ ( λη + η ) − ζ ( γ, ε γ,λ , λ ) µ, (A.39)Next observe that by (A.11)-(A.12) and the indiﬀerence principle (38), T ∼ T + γν − ζ ( γ, , λ ) µ (A.40) T + ε γ,λ ( λη + η ) ∼ T + γν + ε γ,λ ( λη + η ) − ζ ( γ, ε γ,λ , λ ) µ, (A.41)If we choose λ ∈ (1 , δ γ ), then putting together (A.60), (A.39), (A.64), and (A.36), we have T ∼ T + γν − ζ ( γ, , λ ) µ ≺ T + γν + ε γ,λ ( λη + η ) − ζ ( γ, ε γ,λ , λ ) µ ∼ T + ε γ,λ ( λη + η ) ≺ T. (A.42) A.9 ince - is assumed to be transitive, it follows that T ≺ T , a contradiction. So g is not strongly rationalizable.This completes the proof of Proposition A.5 in the case that T satisﬁes (A.3). The proof is similar in the case that T instead satisﬁed (A.4). (cid:3) A.3 Holding revenue constant

In this section I strengthen Proposition A.5; in particular, I prove the following.

Proposition A.3

Under the assumptions of section 7, if g does not depend only on utility then g is not rational-izable. The diﬀerence between Propositions A.5 and A.3 is that the latter appeals to rationalizability rather than strongrationalizability. It is harder to show that welfare weights are not rationalizable than that they are not stronglyrationalizable. In particular we have to show that the local improvement and indiﬀerence principles, even whenrestricted to paths along which revenue is held constant , imply the existence of a cycle. The key diﬀerence then isthat our constructions must hold revenue constant, a feature that we did not worry about in Section A.2.Now assume that g does not only depend on utility. So in this proof we are entitled to all of the constructionsand conditions that were derived on the basis of this assumption in the course of the proof of Proposition A.5. Inparticular, note that I consider the case in which (A.3) is assumed to hold, but the proof for the case in which(A.4) is assumed to hold is similar. In this spirit, let T be the same as the tax policy T used in the proof ofProposition A.5. Let φ ′ and z ′ also be as in the proof or Proposition A.5, and observe by our assumptions onˆ T , z φ = z φ ( T ), where z φ is deﬁned by (35). Then z φ < z ′ . Let Ψ be the set of all three times continuouslydiﬀerentiable real-valued functions functions ψ on Z whose support is a proper subset of (cid:16) z φ , z ′ (cid:17) such that ψ ψ ( z ) ≥ , ∀ z ∈ (cid:16) z φ , z ′ (cid:17) .Choose ω, ψ ∈ Ψ with disjoint support. It follows from the Picard-Lindel¨of theorem that there exists ¯ σ > χ : [ − ¯ σ, ¯ σ ] → R satisfying: χ (0) = 0 (A.43) Z φφ g φ ( T + σω − χ ( σ ) ψ ) (cid:20) − dd σ χ ( σ ) ψ ( z φ ( T + σω − χ ( σ ) ψ )) + ω ( z φ ( T + σω − χ ( σ ) ψ )) (cid:21) f ( φ ) d φ = 0 . (A.44)For σ ∈ [ − ¯ σ, ¯ σ ], deﬁne ˜ T σ = T + σω − χ ( σ ) ψ. (A.45)Observe that ˜ T = T . Observe that ˜ T σ (as well as χ ) depend on the choice of ω and ψ . When I want to expressthis dependence, I write ˜ T ω,ψσ instead of . There are two possibilities1. There exist ω, ψ ∈ Ψ with disjoint support such that dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0.2. For all ω, ψ ∈ Ψ with disjoint support, dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0.In this Appendix, I will assume that we are in the ﬁrst case. However, in Appendix A.4, I will consider the secondcase and show that in that case, it is always possible to make a modiﬁcation to the original tax policy T such thatsteps in the proof of Proposition A.5 still hold and such that for some ω, ψ ∈ Ψ , dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0. Thus I will A.10 ow assume that dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T σ (cid:17) = 0, but I will show in Appendix A.4 that I could have always chosen the originaltax policy T so that this assumption (or a similar assumption) holds. I separate this part of the argument so as tomake clearer the structure of the proof, and focus, in this section, on what I take to be the more important details.Since σ R (cid:16) ˜ T σ (cid:17) is continuously diﬀerentiable, it follows that there is some interval [ σ, σ ] containing 0on which R (cid:16) ˜ T σ (cid:17) is either increasing or decreasing in σ . We may assume that the interval is chosen so that (cid:12)(cid:12)(cid:12) R (cid:16) ˜ T (cid:17) − R (cid:16) ˜ T σ (cid:17)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) R (cid:16) ˜ T (cid:17) − R (cid:16) ˜ T σ (cid:17)(cid:12)(cid:12)(cid:12) =: r . It follows that for each value r ∈ [ − r, r ], there exists a unique value σ ( r ) ∈ [ σ, σ ] such that R (cid:16) ˜ T (cid:17) − R (cid:16) ˜ T σ ( r ) (cid:17) = r. (A.46)Then let ˜ T ∗ r := ˜ T σ ( r ) . Moreover because T is strictly convex on h z φ , ¯ z i –and moreover d d z T ( z ) < ω and ψ –if r is chosen to be small enough, then ˜ T ∗ r is strictly convex on the same interval for all r ∈ [ − r, r ]. Letus assume that r is so chosen.I now prove the following lemma Lemma A.1

For any ρ ∈ P , deﬁne r ρ : [ a, b ] → R by r ρ ( θ ) = R (cid:0) T ρ ,θ (cid:1) − R (cid:0) T ρ , (cid:1) , ∀ θ ∈ [ a, b ] . (A.47) Let r ∈ ( − r, r ) , and consider ρ ∈ P such that such that ∀ θ ∈ [ a, b ] , T ρ ,θ ∈ ˆ T is strictly convex on h z φ , ¯ z i and | r ρ ( θ ) | < r − | r | , ∀ θ ∈ [ a, b ] . (A.48) and ∀ θ ∈ [ a, b ] , T ρ ,θ = ˜ T ∗ r on [0 , z ′ ] . Then there exists ρ ∈ P such that T ρ,θ ( z ) =  ˜ T ∗ r + r ρ ( θ ) ( z ) if z < z ′ ,T ρ ,θ ( z ) if z ≥ z ′ . and Z φφ g φ (cid:0) T ρ,θ (cid:1) ∂∂θ T (cid:16) z ρφ ( θ ) , θ (cid:17) f ( φ ) d φ = Z φφ g φ (cid:0) T ρ ,θ (cid:1) ∂∂θ T (cid:16) z ρ φ ( θ ) , θ (cid:17) f ( φ ) d φ, ∀ θ ∈ [ a, b ] . (A.49)I now prove the lemma. So assume that there is ρ ∈ P with the properties given in the assumptions of thelemma. Now consider a family of tax policies (cid:0) S θ (cid:1) θ ∈ [ a,b ] such that S θ ( z ) =  ˜ T ∗ r + r ρ ( θ ) ( z ) if z < z ′ ,T ρ ,θ ( z ) if z ≥ z ′ . (A.50)Thus, we would like to establish that there exists ρ ∈ P such that T ρ,θ = S θ , for all θ ∈ [ a, b ]. Observe thatbecause the supports of ω and ψ were assumed contained in (0 , z ′ ), there is some z ′ < z ′ such that for all A.11 ∈ [ a, b ] , T ρ ,θ = ˜ T ∗ r ( θ ) on [ z ′ , z ′ ]. It now follows from the twice continuous diﬀerentiability and strict convexityof T ρ ,θ and ˜ T ∗ r ( θ ) that z S θ ( z ) is twice continuously diﬀerentiable and strictly convex on h z φ , ¯ z i for all θ ∈ [ a, b ]; moreover, ( z, θ ) S θ ( z ) is twice continuously diﬀerentiable. It follows that there exists ρ ∈ P such that T ρ,θ = S θ for all θ ∈ [ a, b ]. We want to show that ρ ∈ P .Note also that by construction and convexity of the relevant tax policies, ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ θ ∈ [ a, b ] , z φ (cid:0) S θ (cid:1) =  z φ (cid:16) ˜ T ∗ r ( θ ) (cid:17) if φ < φ ′ z ρ φ ( θ ) if φ ≥ φ ′ . (A.51)Observe that R (cid:0) T ρ , (cid:1) − R (cid:0) S θ (cid:1) = Z φφ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φφ S θ (cid:0) z φ (cid:0) S θ (cid:1)(cid:1) f ( φ ) d φ = "Z φ ′ φ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φ ′ φ S θ (cid:0) z φ (cid:0) S θ (cid:1)(cid:1) f ( φ ) d φ + "Z φφ ′ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φφ ′ S θ (cid:0) z φ (cid:0) S θ (cid:1)(cid:1) f ( φ ) d φ = "Z φ ′ φ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φ ′ φ ˜ T ∗ r + r ρ ( θ ) (cid:16) z φ (cid:16) ˜ T ∗ r + r ρ ( θ ) (cid:17)(cid:17) f ( φ ) d φ + "Z φφ ′ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φφ ′ T ρ ,θ (cid:16) z ρ φ ( θ ) (cid:17) f ( φ ) d φ = "Z φφ ˜ T ∗ r (cid:16) z φ (cid:16) ˜ T ∗ r (cid:17)(cid:17) f ( φ ) d φ − Z φφ ˜ T ∗ r + r ρ ( θ ) (cid:16) z φ (cid:16) ˜ T ∗ r + r ρ ( θ ) (cid:17)(cid:17) f ( φ ) d φ + "Z φφ T ρ , (cid:16) z ρ φ (0) (cid:17) f ( φ ) d φ − Z φφ T ρ ,θ (cid:16) z ρ φ ( θ ) (cid:17) f ( φ ) d φ = h R (cid:16) ˜ T ∗ r (cid:17) − R (cid:16) ˜ T ∗ r + r ρ ( θ ) (cid:17)i + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ ,θ (cid:1)(cid:3) = (cid:16)h R (cid:16) ˜ T (cid:17) − R (cid:16) ˜ T ∗ r + r ρ ( θ ) (cid:17)i − h R (cid:16) ˜ T (cid:17) − R (cid:16) ˜ T ∗ r (cid:17)i(cid:17) + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ ,θ (cid:1)(cid:3) = (cid:16) r + r ρ ( θ ) (cid:17) − r + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ ,θ (cid:1)(cid:3) = r ρ ( θ ) + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ ,θ (cid:1)(cid:3) = (cid:2) R (cid:0) T ρ ,θ (cid:1) − R (cid:0) T ρ , (cid:1)(cid:3) + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ ,θ (cid:1)(cid:3) = 0 , where the third equality follows from (A.50) and (A.51), the fourth from the fact that T ρ , (cid:16) z ρ φ (0) (cid:17) = T ρ ,θ (cid:16) z ρ φ ( θ ) (cid:17) =˜ T ∗ r (cid:16) z φ (cid:16) ˜ T ∗ r (cid:17)(cid:17) when φ ∈ (cid:2) φ ′ , φ (cid:3) and ˜ T ∗ r (cid:16) z φ (cid:16) ˜ T ∗ r (cid:17)(cid:17) = ˜ T ∗ r + r ρ ( θ ) (cid:16) z φ (cid:16) ˜ T ∗ r + r ρ ( θ ) (cid:17)(cid:17) = T ( z φ ( T )) when φ ∈ (cid:2) φ, φ ′ (cid:1) ,the seventh from (A.46), and the ninth from (A.47). So I have now established that R (cid:0) T ρ , (cid:1) = R (cid:0) S θ (cid:1) , ∀ θ ∈ [ a, b ] . It follows from this and the other properties of S θ established above that ρ : θ S θ belongs to P . Thus we can The existence of such a z ′ follows from the fact that the support of a function is a closed set. This follows from the fact that we have assumed r is suﬃciently small that ˜ T ∗ r is strictly convex on (cid:2) z φ , ¯ z (cid:3) when r ∈ [ − r, r ], and that, by (A.48), for all θ ∈ [ a, b ], | r + r ρ ( θ ) | ≤ | r | + | r ρ ( θ ) | ≤ r . A.12 ake T ρ,θ = S θ .Let me write S ( z, θ ) = S θ ( z ) and T ( z, θ ) = T θ ( z ). Then we have: Z φφ g φ (cid:0) S θ (cid:1) ∂∂θ S (cid:0) z φ (cid:0) S θ (cid:1) , θ (cid:1) f ( φ ) d φ = Z φ ′ φ g φ (cid:0) S θ (cid:1) ∂∂θ S (cid:0) z φ (cid:0) S θ (cid:1) , θ (cid:1) f ( φ ) d φ + Z φφ ′ g φ (cid:0) S θ (cid:1) ∂∂θ S (cid:0) z φ (cid:0) S θ (cid:1) , θ (cid:1) f ( φ ) d φ = σ ′ ( r + r ρ ( θ )) dd θ r ρ ( θ ) × Z φ ′ φ g φ (cid:0) S θ (cid:1) (cid:20) ω (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17) − dd σ χ ( σ ( r + r ρ ( θ ))) ψ (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17)(cid:21) f ( φ ) d φ + Z φφ ′ g φ (cid:0) T ρ ,θ (cid:1) ∂∂θ T ρ (cid:16) z ρ φ ( θ ) , θ (cid:17) f ( φ ) d φ = σ ′ ( r + r ρ ( θ )) dd θ r ρ ( θ ) × Z φφ g φ (cid:0) S θ (cid:1) (cid:20) ω (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17) − dd σ χ ( σ ( r + r ρ ( θ ))) ψ (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17)(cid:21) f ( φ ) d φ + Z φφ g φ (cid:0) T ρ ,θ (cid:1) ∂∂θ T ρ (cid:16) z ρ φ ( θ ) , θ (cid:17) f ( φ ) d φ = Z φφ g φ (cid:0) T ρ ,θ (cid:1) ∂∂θ T ρ (cid:16) z ρ φ ( θ ) , θ (cid:17) f ( φ ) d φ, (A.52)where the second equality follows from (A.50), (A.51), and (A.45), the third from the fact that when φ ∈ (cid:2) φ ′ , φ (cid:3) ,0 = ω ( z φ ( T )) = ω (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17) and 0 = ψ ( z φ ( T )) = ψ (cid:16) z φ (cid:16) ˜ T σ ( r + r ρ ( θ )) (cid:17)(cid:17) and when φ ∈ (cid:2) φ, φ ′ (cid:1) , T (cid:16) z ρ φ ( θ ) , θ (cid:17) = ˜ T ∗ r (cid:16) z φ (cid:16) ˜ T ∗ r (cid:17)(cid:17) , and the last equality by (A.44). (A.52) establishes (A.49) and completes the proofof Lemma A.1. (cid:3) Choose γ and λ for which the cycle (A.42) holds, and let γ ∗ = γ, λ ∗ = λ, ε ∗ = ε γ,λ . Now consider ρ , ρ , ρ , ρ ∈ P such that T ρ ,θ = ˜ T ∗ r + γ ∗ θν − ζ ( γ ∗ θ, , λ ∗ ) µ (A.53) T ρ ,θ = ˜ T ∗ r + ε ∗ θ ( λ ∗ η + η ) (A.54) T ρ ,θ = ˜ T ∗ r + γ ∗ θν + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ θ, ε ∗ , λ ∗ ) µ (A.55) T ρ ,θ = ˜ T ∗ r + γ ∗ ν + ε ∗ θ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ θ, λ ∗ ) µ, (A.56)where r = 0 ,r = 0 ,r = R ( T + ε ∗ ( λ ∗ η + η )) − R ( T ) ,r = R ( T + γ ∗ ν − ζ ( γ ∗ , , λ ∗ ) µ ) − R ( T ) , (A.57)If ε ∗ and γ ∗ are chosen small enough–which is consistent with (A.42)–the ρ , ρ , ρ , and ρ indeed belong to P . It A.13 ollows from Lemma A.1 that for i = 1 , , ,

4, there exists ˆ ρ i ∈ P such that T ˆ ρ i ,θ ( z ) =  ˜ T ∗ r i + r ρi ( θ ) ( z ) if z < z ′ ,T ˆ ρ i ,θ ( z ) if z ≥ z ′ . (A.58)and Z φφ g φ (cid:0) T ˆ ρ i ,θ (cid:1) ∂∂θ T (cid:16) z ˆ ρ i φ ( θ ) , θ (cid:17) f ( φ ) d φ = Z φφ g φ (cid:0) T ρ i ,θ (cid:1) ∂∂θ T (cid:16) z ρ i φ ( θ ) , θ (cid:17) f ( φ ) d φ, ∀ θ ∈ [ a, b ] . (A.59)I now assume that - is a relation that rationalizes g . It follows from (A.53), (A.12), the indiﬀerence principle (38),and (A.59) that T ˆ ρ , ∼ T ˆ ρ , . (A.60)Similarly, it follows from (A.54), (A.35), the fact that λ ∗ >

1, the local improvement principle (37), and (A.59)that T ˆ ρ , ≻ T ˆ ρ , . (A.61)Next observe that for all θ ∈ [0 , Z φφ g φ (cid:16) ˆ T γ ∗ θ ,ε ∗ ,λ ∗ (cid:17) " dd θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ˆ T γ ∗ θ,ε ∗ ,λ ∗ (cid:16) z φ (cid:16) ˆ T γ ∗ θ ,ε ∗ ,λ ∗ (cid:17)(cid:17) f ( φ ) d φ = Z φφ g φ (cid:0) T ρ ,θ (cid:1) " dd θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ T ρ ,θ (cid:0) z φ (cid:0) T ρ ,θ (cid:1)(cid:1) f ( φ ) d φ, (A.62) Z φφ g φ (cid:16) ˆ T γ ∗ ,ε ∗ θ ,λ ∗ (cid:17) " dd θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ˆ T γ ∗ ,ε ∗ θ,λ ∗ (cid:16) z φ (cid:16) ˆ T γ ∗ ,ε ∗ θ ,λ ∗ (cid:17)(cid:17) f ( φ ) d φ = Z φφ g φ (cid:0) T ρ ,θ (cid:1) " dd θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ T ρ ,θ (cid:0) z φ (cid:0) T ρ ,θ (cid:1)(cid:1) f ( φ ) d φ. (A.63)(A.62) follows because, in both integrals, the integrand is equal to zero when φ < φ ′ and the integrands in the twointegrals are equal elsewhere. The reasoning justifying (A.63) is similar.It follows from (A.55), (A.12), the indiﬀerence principle (38), (A.59), and (A.62) that T ˆ ρ , ∼ T ˆ ρ , . (A.64)Similarly, it follows from (A.56), (A.38), the local improvement principle (37), (A.59), and (A.63) that T ˆ ρ , ≺ T ˆ ρ , . (A.65) So, in fact, the two integrands are equal everywhere.

A.14 t follows from (A.53)-(A.56) and (A.57) that r + r ρ (1) =0 + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ , (cid:1)(cid:3) = r + r ρ (0) r + r ρ (1) = [ R ( T + γ ∗ ν − ζ ( γ ∗ , , λ ∗ ) µ ) − R ( T )] + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ , (cid:1)(cid:3) = [ R ( T + γ ∗ ν − ζ ( γ ∗ , , λ ∗ ) µ ) − R ( T )]+ [ R ( T + γ ∗ ν + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ , λ ∗ ) µ ) − R ( T + γ ∗ ν − ζ ( γ ∗ , , λ ∗ ) µ )]= [ R ( T + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ , λ ∗ ) µ ) − R ( T )]+ [ R ( T + γ ∗ ν + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ , λ ∗ ) µ ) − R ( T + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ , λ ∗ ) µ )]= [ R ( T + ε ∗ ( λ ∗ η + η ) − ζ ( γ ∗ , ε ∗ , λ ∗ ) µ ) − R ( T )] + (cid:2) R (cid:0) T ρ , (cid:1) − R (cid:0) T ρ , (cid:1)(cid:3) = r + r ρ (1) r + r ρ (0) = [ R ( T + ε ∗ ( λ ∗ η + η )) − R ( T )] + 0 = r + r ρ (1) r + r ρ (0) = 0 = r + r ρ (0) (A.66)One can use (A.66) to establish that T ρ , = T ρ , , T ρ , = T ρ , , T ρ , = T ρ , , T ρ , = T ρ , . (A.67)It follows from (A.66), (A.67) and (A.58) that T ˆ ρ , = T ˆ ρ , , T ˆ ρ , = T ˆ ρ , , T ˆ ρ , = T ˆ ρ , , T ˆ ρ , = T ˆ ρ , . (A.68)It follows from (A.60), (A.61), (A.64), (A.65) and (A.68) that T ˆ ρ , ∼ T ˆ ρ , = T ˆ ρ , ≺ T ˆ ρ , = T ˆ ρ , ∼ T ˆ ρ , = T ˆ ρ , ≺ T ˆ ρ , = T ˆ ρ , (A.69)It follows that any relation - that rationalizes g is not a preorder. So g is not rationalizable. This completes theproof. (cid:3) A.4 The case in which for all ω, ψ ∈ Ψ with disjoint support, dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0 . During the course of the proof of Proposition A.5, I assumed that there exist ω, ψ ∈ Ψ with disjoint support suchthat dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0. As discussed during the course of the proof of Proposition A.5, in this appendix, I showthat if we happen to start with a tax T such that for all ω, ψ ∈ Ψ, dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0, then we can construct anew tax policy T which satisﬁes all the essential properties in Proposition A.5, but for which there exist ω, ψ in Ψwith disjoint support, such that dd σ (cid:12)(cid:12) σ =0 R (cid:16) ˜ T ω,ψσ (cid:17) = 0. Thus we can use T instead of T to establish PropositionsA.5 and A.3.Let us rename the tax policy T in Propositions A.5 and A.3 as T ∗ to free up T to be used as a variablerepresenting an arbitrary tax policy. Now deﬁne T ∗ = (cid:26) T ∈ ˆ T : T = T ∗ on [ z ′ , ¯ z ] and d d z T < (cid:16) z φ , z ′ i(cid:27) . Note that any tax policy T ∈ T ∗ would have been suﬃcient for the proofs of Propositions A.5 and A.3, since theproofs did not depend on the precise values of T ∗ beneath z ′ did not matter for the proof. A.15 or any ψ ∈ Ψ, let supp ( ψ ) be the support of ψ . Choose any T ∈ T ∗ and ω, ψ ∈ Ψ. Then there exists afunction χ = χ ω,ψT for which (A.43)-(A.44) are satisﬁed when T ∗ is replaced by T . For any T ∈ T ∗ , ω, ψ ∈ Ψ withsupp ( ω ) ∩ supp ( ψ ) = ∅ and σ suﬃciently close to 0 to be in the domain of χ ω,ψT , deﬁne the tax policy S T,ω,ψσ = T + σω − χ ω,ψT ( σ ) ψ. To deal with the case covered by this section it is suﬃcient to prove the following proposition.

Proposition A.4

There exists T ∈ T ∗ , ω, ψ ∈ Ψ with supp ( ω ) ∩ supp ( ψ ) = ∅ such that dd σ (cid:12)(cid:12) σ =0 R (cid:0) S T,ω,ψσ (cid:1) = 0 . This proposition is suﬃcient because, once it is established, we can simply substitute the tax policy T guaranteedby the proposition for the policy T ∗ that we originally used in Propositions A.5 and A.3 and the arguments in boththose propositions go through for T .Proof of Proposition A.4. Assume for contradiction thatfor all T ∈ T ∗ , ω, ψ ∈ Ψ with supp ( ω ) ∩ supp ( ψ ) = ∅ , dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 R (cid:0) S T,ω,ψσ (cid:1) = 0 . (A.70)It follows from applying the implicit function theorem to ﬁrst order condition for the agent’s optimization problemmax z z − S T,ω,ψσ ( z ) − v φ ( z )that dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 z φ (cid:0) S T,ω,ψσ (cid:1) = − ω ′ ( z φ ( T )) − h dd σ (cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) i ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) . (A.71)Next observe that R (cid:0) S T,ω,ψσ (cid:1) = Z φφ h T (cid:0) z φ (cid:0) S T,ω,ψσ (cid:1)(cid:1) + σω (cid:0) z φ (cid:0) S T,ω,ψσ (cid:1)(cid:1) − χ ω,ψT ( σ ) ψ (cid:0) z φ (cid:0) S T,ω,ψσ (cid:1)(cid:1)i f ( φ ) d φ. So the marginal eﬀect on revenue of changing σ isdd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 R (cid:0) S T,ω,ψσ (cid:1) = Z φφ (cid:20) T ′ ( z φ ( T )) dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 z φ (cid:0) S T,ω,ψσ (cid:1) + ω ( z φ ( T )) − (cid:20) dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) (cid:21) ψ (cid:16) z φ (cid:16) ˜ T σ (cid:17)(cid:17)(cid:21) f ( φ ) d φ = Z φφ  − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) − h dd σ (cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) i ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) − (cid:20) dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) (cid:21) ψ ( z φ ( T )) (cid:21) f ( φ ) d φ = Z φφ " − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) f ( φ ) d φ − (cid:20) dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) (cid:21) Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ, (A.72)where the second equality follows from (A.71). A.16 emma A.2

Assume (A.70). Let T ∈ T ∗ . Suppose that there exists ω ∈ Ψ such that Z φφ " − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) f ( φ ) d φ = 0 . (A.73) Then for all ψ ∈ Ψ , Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ = 0 . (A.74)Proof. Choose T ∈ T ∗ , and consider ω ∈ Ψ satisfying (A.73). Then for any ψ ∈ Ψ, there exists a sequence offunctions ψ , ψ , . . . , ψ n in Ψ such that ψ = ω, ψ n = ψ and supp ( ψ j − ) ∩ supp ( ψ j ) = ∅ for j = 1 , . . . , n . It nowfollows from (A.70) and (A.72) that0 = dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 R (cid:0) S T,ψ ,ωσ (cid:1) = Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ − (cid:20) dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ψ ,ω ( σ ) (cid:21) Z φφ " − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) f ( φ ) d φ = Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ So ψ satisﬁes (A.74). A similar argument shows that if ψ j satisﬁes (A.74), then so does ψ j +1 . (cid:3) Deﬁne T ∗ = ( T ∈ T ∗ : ∀ ψ ∈ Ψ , Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ = 0 ) , T ∗ = ( T ∈ T ∗ : ∀ ψ ∈ Ψ , Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ = 0 ) , T ∗ = ( T ∈ T ∗ : ∃ ψ ∈ Ψ , Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ = 0 ) . Corollary A.2

Assume (A.70). Then T ∗ ∪ T ∗ = T ∗ and T ∗ = T ∗ . This corollary is an immediate consequence of Lemma A.2 and (A.70).

Lemma A.3

Assume (A.70). Then for all T ∈ T ∗ , there exists c ( T ) ∈ R \ { } , such that for all ψ ∈ Ψ , Z φφ g φ ( T ) ψ ( z φ ( T )) f ( φ ) d φ = c ( T ) Z φφ " − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) f ( φ ) d φ (A.75) There are two cases to consider: (i) supp ( ω ) ∪ supp ( ψ ) = (cid:2) z φ , z ′ (cid:3) and (ii) supp ( ω ) ∪ supp ( ψ ) = (cid:2) z φ , z ′ (cid:3) . First considercase (i). Then because Ψ is deﬁned so that the support of any function in Ψ is strictly contained in (cid:2) z φ , z ′ (cid:3) , it follows thatsupp ( ω ) \ supp ( ψ ) = ∅ . So we can pick ψ = ω , ψ so that supp ( ψ ) ∩ supp ( ω ) = ∅ , ψ so that supp ( ψ ) ⊆ supp ( ω ) \ supp ( ψ )and ψ = ψ . Next consider case (ii). Then we can choose ψ = ω , ψ so that supp ( ψ ) ∩ (supp ( ω ) ∪ supp ( ψ )) = ∅ , and ψ = ψ . So we can ﬁnd an appropriate sequence in both cases. Notice that in both cases, we can take n ≤ A.17 roof. If T ∈ T ∗ , it follows from (A.70) and (A.72) that for all ω, ψ ∈ Ψ with disjoint support,dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) = R φφ h − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) i f ( φ ) d φ R φφ h − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) i f ( φ ) d φ (A.76)Next observe that in the case that σ = 0, (A.44) can equivalently be rewritten as:dd σ (cid:12)(cid:12)(cid:12)(cid:12) σ =0 χ ω,ψT ( σ ) = R φφ g φ ( T ) ω ( z φ ( T )) f ( φ ) d φ R φφ g φ ( T ) ψ ( z φ ( T )) f ( φ ) d φ . (A.77)It follows from (A.76) and (A.77) that R φφ h − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) i f ( φ ) d φ R φφ h − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) i f ( φ ) d φ = R φφ g φ ( T ) ω ( z φ ( T )) f ( φ ) d φ R φφ g φ ( T ) ψ ( z φ ( T )) f ( φ ) d φ , or equivalently, for all ω, ψ ∈ Ψ with disjoint support, R φφ g φ ( T ) ω ( z φ ( T )) f ( φ ) d φ R φφ h − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) i f ( φ ) d φ (A.78)= R φφ g φ ( T ) ψ ( z φ ( T )) f ( φ ) d φ R φφ h − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ψ ( z φ ( T )) i f ( φ ) d φ (A.79)Now choose some speciﬁc ω ∈ Ψ, and deﬁne c ( T ) = R φφ g φ ( T ) ω ( z φ ( T )) f ( φ ) d φ R φφ h − T ′ ( z φ ( T )) ω ′ ( z φ ( T )) T ′′ ( z φ ( T ))+ v ′′ φ ( z φ ( T )) + ω ( z φ ( T )) i f ( φ ) d φ . (A.80)Choose any other ψ ∗ ∈ Ψ. Then there exists a sequence ψ , ψ , . . . , ψ n in Ψ such that ψ = ω, ψ n = ψ ∗ and ψ j − and ψ j have disjoint support. (See footnote 27). Then setting ψ = ψ , in (A.78), and using (A.80) with ψ = ψ , wederive (A.75) for ψ . Next assuming that (A.75) holds for ψ j , we can derive it for ψ j +1 using a similar argument. (cid:3) Lemma A.4

Assume (A.70). For all T ∈ T ∗ and φ ∈ (cid:2) φ, φ (cid:3) , deﬁne a T ( φ ) = (cid:18) g φ ( T ) c ( T ) − (cid:19) f ( φ ) ,b T ( φ ) = − T ′ ( z φ ( T )) ∂ ∂φ∂z v ( z φ ( T ) , φ ) f ( φ ) For all T ∈ T ∗ , a T = b ′ T on (cid:2) φ, φ ′ (cid:3) . For all T ∈ T ∗ , f = b ′ T . Proof. Fix T ∈ T ∗ . We can rewrite (A.75) as A.18 ψ ∈ Ψ , Z φφ (cid:18) g φ ( T ) c ( T ) − (cid:19) ψ ( z φ ( T )) f ( φ ) d φ = Z φφ − T ′ ( z φ ( T )) ψ ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) f ( φ ) d φ (A.81)If for any ψ ∈ Ψ, we deﬁne ψ T ( φ ) := ψ ( z φ ( T )), then ψ ′ T ( φ ) = ψ ′ ( z φ ( T )) dd φ z φ ( T ). Then observe also that b T ( φ ) = − T ′ ( z φ ( T )) ∂ ∂φ∂z v ( z φ ( T ) , φ ) f ( φ )= − T ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) T ′′ ( z φ ( T )) + ∂ ∂z v ( z φ ( T ) , φ ) ∂ ∂φ∂z v ( z φ ( T ) , φ ) f ( φ )= T ′ ( z φ ( T )) T ′′ ( z φ ( T )) + v ′′ φ ( z φ ( T )) 1 dd φ z φ ( T ) f ( φ ) , where the last equality follows from applying the implicit function theorem to the ﬁrst-order condition for theagent’s optimization problem. It now follows that we can rewrite (A.81) as ∀ ψ ∈ Ψ , Z φ ′ φ a T ( φ ) ψ T ( φ ) d φ = − Z φ ′ φ b T ( φ ) ψ ′ T ( φ ) d φ, (A.82)where the upper bounds of integration can be taken to be φ ′ rather than φ because all ψ ∈ Ψ have support in (cid:2) φ, φ ′ (cid:3) . The proof of Lemma A.4 for the case where T ∈ T ∗ is now completed by Lemma A.5 (below). The proofin the case that T ∈ T ∗ . (cid:3) Lemma A.5 (A.82) holds if and only if holds if and only if a T = b ′ T . Proof. First suppose that a T = b ′ T , and let B T be an antiderivative of b T . Then using integration by parts, wehave that for any ψ ∈ Ψ, Z φ ′ φ b T ( φ ) ψ ′ T ( φ ) d φ = [ B T ( φ ) ψ T ( φ )] φ ′ φ − Z φ ′ φ a T ( φ ) ψ T ( φ ) d φ = − Z φ ′ φ a T ( φ ) ψ T ( φ ) d φ, where the second equality follows from the fact that for any ψ ∈ Ψ, ψ T (cid:0) φ (cid:1) = ψ T ( φ ′ ) = 0.Going in the other direction, suppose that (A.82) holds. It also follows from integration by parts that ∀ ψ ∈ Ψ , Z φ ′ φ b ′ T ( φ ) ψ T ( φ ) d φ = − Z φ ′ φ b T ( φ ) ψ ′ T ( φ ) d φ. (A.83)It follows from (A.82) and (A.83) that ∀ ψ ∈ Ψ , Z φ ′ φ ( b ′ T ( φ ) − a T ( φ )) ψ T ( φ ) d φ = 0 . (A.84)Since both b ′ T and a T are continuously diﬀerentiable, if b ′ T = a T , then it would be possible to ﬁnd ψ ∈ Ψ whosesupport is contained in either (cid:8) z φ ( T ) : φ ∈ (cid:2) φ, φ ′ (cid:3) , b ′ T ( φ ) > a T ( φ ) (cid:9) or (cid:8) z φ ( T ) : φ ∈ (cid:2) φ, φ ′ (cid:3) , b ′ T ( φ ) < a T ( φ ) (cid:9) andin that case the integral in (A.84) would be nonzero, a contradiction. So we must have b ′ T = a T . (cid:3) A.19 he requirement that a T = b ′ T when T ∈ T ∗ is the requirement that: (cid:18) g φ ( T ) c ( T ) − (cid:19) f ( φ ) = − dd φ " T ′ ( z φ ( T )) ∂ ∂φ∂z v ( z φ ( T ) , φ ) f ( φ ) , ∀ φ ∈ (cid:2) φ, φ ′ (cid:3) , ∀ T ∈ T ∗ . or equivalently that g φ ( T ) f ( φ ) = c ( T ) h ( T, φ ) , ∀ φ ∈ (cid:2) φ, φ ′ (cid:3) , ∀ T ∈ T ∗ (A.85)where h ( T, φ ) := − dd φ " T ′ ( z φ ( T )) ∂ ∂φ∂z v ( z φ ( T ) , φ ) f ( φ ) + f ( φ ) ! . (A.86)Similarly the requirement that − f = b ′ T when T ∈ T ∗ can be written as0 = h ( T, φ ) , ∀ φ ∈ (cid:2) φ, φ ′ (cid:3) , ∀ T ∈ T ∗ . (A.87) Lemma A.6

Assume (A.70). Let T and T be two tax policies in T ∗ satisfying (A.85) such that there is someinterval I containing both and z ∗ := z φ ∗ ( T ) > for some φ ∗ ∈ (cid:0) φ, φ ∗ (cid:1) , T = T on I . Then c ( T ) = c ( T ) . Proof. The assumptions of the lemma imply that z φ ∗ ( T ) = z φ ∗ ( T ) and U φ ∗ ( T ) = U φ ∗ ( T ). It follows that c ( T ) h ( T , φ ∗ ) = g φ ∗ ( U φ ∗ ( T ) , z φ ∗ ( T )) f ( φ ∗ ) = g φ ∗ ( U φ ∗ ( T ) , z φ ∗ ( T )) f ( φ ∗ )= c ( T ) h ( T , φ ∗ ) = c ( T ) h ( T , φ ∗ ) . where the ﬁrst and third equalities follow from (A.85) and the last equality follows from (A.86) and the fact that T and T agree on an interval containing z ∗ . It now follows from the fact that g φ ∗ ( U φ ∗ ( T ) , z φ ∗ ( T )) f ( φ ∗ ) > c ( T ) = c ( T ). (cid:3) Observe that by our assumptions min φ ∈ [ φ,φ ] ,z ∈ Z ∂ ∂z v ( z, φ ) > , min φ ∈ [ φ,φ ] ,z ∈ Z (cid:12)(cid:12)(cid:12) ∂ ∂φ∂z v ( z, φ ) (cid:12)(cid:12)(cid:12) > , andmax φ ∈ [ φ,φ ] ,z ∈ Z . (cid:12)(cid:12)(cid:12) ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) (cid:12)(cid:12)(cid:12) < ∞ . It follows that there exists δ > ∀ φ ∈ (cid:2) φ, φ (cid:3) , ∀ T ∈ T ∗ , ∂ ∂z v ( z φ ( T ) , φ ) + ∂ ∂φ∂z v ( z φ ( T ) , φ ) ∂ ∂φ∂z v ( z φ ( T ) , φ ) δ > T ∗ imply that for all T ∈ T ∗ , T ′ (cid:16) z φ ( T ) (cid:17) = 0 , T ′ (cid:0) z φ ′ ( T ) (cid:1) = dd z T ∗ (cid:0) z φ ∗ ( T ∗ ) (cid:1) , we canﬁx some δ ∈ (cid:0) , dd z T ∗ (cid:0) z φ ∗ ( T ∗ ) (cid:1)(cid:1) which is also suﬃciently small that (A.88) is satisﬁed and for all T ∈ T ∗ , ∃ z ∈ (cid:16) z φ , z ′ (cid:17) , T ′ ( z ) = δ. Hence for all T ∈ T ∗ , ∃ φ ∗ ∈ (cid:2) φ, φ ′ (cid:3) , T ′ ( z φ ∗ ( T )) = δ (A.89)So choose some such T and φ ∗ . Observe moreover that z φ ∗ ( T ) > . Then choose exists T ∈ T ∗ such thatfor some z ◦ > φ ◦ , z ◦ = z φ ◦ ( T ) and T , T agree on [0 , z ◦ ], and such that z φ ∗ ( T ) = z φ ∗ ( T ), T ′ ( z φ ∗ ( T )) = T ′ ( z φ ∗ ( T )) = δ , T ′′ ( z φ ∗ ( T )) = T ′′ ( z φ ∗ ( T )). Deﬁne T ε := (1 − ε ) T + εT . By Corollary A.2,there are two possibilities: Either (i) T ∗ = ∅ or (ii) all T ∗ = T ∗ . In case (i), then we may assume that for all A.20 ∈ [0 , T ε ∈ T ∗ . To see this, observe that it follows from Corollary A.2 that if T ∈ T ∗ and T is suﬃcientlyclose to T , then T ε ∈ T ∗ for all ε ∈ [0 , So if T ∈ T ∗ , we can simply choose T suﬃciently close. In case (ii),it is immediate that T ε ∈ T ∗ for all ε ∈ [0 , c ∈ R \ { } such that c ( T ε ) = c, ∀ ε ∈ [0 , U ∈ R and z > z φ ∗ ( T ε ) = z, U φ ∗ ( T ε ) = U , and T ′ ε ( z φ ∗ ( T ε )) = δ, ∀ ε ∈ [0 , r ∈ R + such that g φ ∗ ( T ε ) f ( φ ∗ ) = r, ∀ ε ∈ [0 , r = ch ( T ε , φ ∗ ) , ∀ ε ∈ [0 , dd ε h ( T ε , φ ) = 0. Deﬁne k ( T, φ ) by − k ( T, φ ) + f ( φ ) = h ( T, φ ). It follows that dd ε k ( T ε , φ ∗ ) = 0. In case (ii), it follows from (A.87) that dd ε k ( T ε , φ ∗ ) = 0. So, in both cases, dd ε k ( T ε , φ ∗ ) = 0.I will now attain a contradiction by showing that dd ε k ( T ε , φ ∗ ) = 0. We have k ( T ε , φ ∗ )= dd φ " T ′ ε ( z φ ∗ ( T ε )) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ( φ ∗ ) = " T ′′ ε ( z φ ∗ ( T ε )) dd φ (cid:12)(cid:12)(cid:12)(cid:12) φ = φ ∗ [ z φ ( T ε )] f ( φ ∗ ) + T ′ ε ( z φ ∗ ( T ε )) f ′ ( φ ∗ ) ! ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) − " ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) dd φ (cid:12)(cid:12)(cid:12)(cid:12) φ = φ ∗ [ z φ ( T ε )] + ∂ ∂φ ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) T ′ ε ( z φ ∗ ( T ε )) f ( φ ∗ ) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) (cid:21) = ((cid:20) T ′′ ε ( z φ ∗ ( T ε )) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) − ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) T ′ ε ( z φ ∗ ( T ε )) (cid:21) dd φ (cid:12)(cid:12)(cid:12)(cid:12) φ = φ ∗ [ z φ ( T ε )] f ( φ ∗ )+ (cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ′ ( φ ∗ ) + ∂ ∂φ ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ( φ ∗ ) (cid:21) T ′ ε ( z φ ∗ ( T ε )) (cid:27),(cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) (cid:21) = (cid:26) − (cid:20) T ′′ ε ( z φ ∗ ( T ε )) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) − ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) T ′ ε ( z φ ∗ ( T ε )) (cid:21) × ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) T ′′ ε ( z φ ∗ ( T ε )) + ∂ ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ( φ ∗ )+ (cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ′ ( φ ∗ ) + ∂ ∂φ ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) f ( φ ∗ ) (cid:21) T ′ ε ( z φ ∗ ( T ε )) (cid:27),(cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ε ) , φ ∗ ) (cid:21) = (cid:26) − (cid:20) T ′′ ε ( z φ ∗ ( T )) ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) − ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) δ (cid:21) × ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) T ′′ ε ( z φ ∗ ( T )) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) f ( φ ∗ )+ (cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) f ′ ( φ ∗ ) + ∂ ∂φ ∂z v ( z φ ∗ ( T ) , φ ∗ ) f ( φ ∗ ) (cid:21) δ (cid:27),(cid:20) ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) (cid:21) Closeness is measured relative to the norm ρ deﬁned in Section 2. A.21 − T ′′ ε ( z φ ∗ ( T )) + ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) δT ′′ ε ( z φ ∗ ( T )) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) f ( φ ∗ ) + C = − T ′′ ( z φ ∗ ( T )) − ε ( T ′′ ( z φ ∗ ( T )) − T ′′ ( z φ ∗ ( T ))) + ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) T ′ ( z φ ∗ ( T )) T ′′ ( z φ ∗ ( T )) + ε ( T ′′ ( z φ ∗ ( T )) − T ′′ ( z φ ∗ ( T ))) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) f ( φ ∗ ) + C, where the third equality substitutes an expression equivalent to dd φ (cid:12)(cid:12)(cid:12) φ = φ ∗ [ z φ ( T ε )] applying the implicit functiontheorem to the agent’s ﬁrst-order condition, and C is a constant that does not depend on ε .It follows thatdd ε (cid:12)(cid:12)(cid:12)(cid:12) ε =0 k ( T ε , φ ∗ ) = (cid:20) − ( T ′′ ( z φ ∗ ( T )) − T ′′ ( z φ ∗ ( T ))) (cid:20) T ′′ ( z φ ∗ ( T )) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) (cid:21) f ( φ ∗ ) − ( T ′′ ( z φ ∗ ( T )) − T ′′ ( z φ ∗ ( T ))) " − T ′′ ( z φ ∗ ( T )) + ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) ∂ ∂φ∂z v ( z φ ∗ ( T ) , φ ∗ ) δ f ( φ ∗ ) ,(cid:20) T ′′ ( z φ ∗ ( T )) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) (cid:21) = − ( T ′′ ( z φ ∗ ( T )) − T ′′ ( z φ ∗ ( T ))) (cid:20) ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) + ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) ∂ ∂φ∂z v ( z φ ∗ ( T ) ,φ ∗ ) δ (cid:21)(cid:2) T ′′ ( z φ ∗ ( T )) + ∂ ∂z v ( z φ ∗ ( T ) , φ ∗ ) (cid:3) f ( φ ∗ ) =0 , where the last non-equality follows from (A.88). As we established above that dd ε (cid:12)(cid:12) ε =0 k ( T ε , φ ∗ ) = 0, this leads tothe desired contradiction, implying that (A.70) must be false and so completes the proof of Proposition A.4. (cid:3) A.5 The other direction

In this section I prove the other direction of the main theorem. In particular I prove:

Proposition A.5

Under the assumptions of section 7, if g depends only on utility then g is rationalizable. Proof. If welfare weights depend only on utility, then we can write welfare weights as a function of a single argument y = c − v φ ( z ), that is, of the form g φ ( c − v φ ( z )). Now for each φ deﬁne the utility function u gφ (ˆ y ) = Z ˆ y g φ ( y ) d y Then if we deﬁne U gφ ( T ) = z φ ( T ) − T ( z φ ( T )) − v φ ( z φ ( T )) (A.90)Then consider the social welfare function W ( T ) = Z φφ U gφ ( T ) f ( φ ) d φ Then deﬁne - as in (17), and the proof is essentially the same as the proof of Proposition 4. (cid:3)(cid:3)