[PDF] On Vickrey's Income Averaging

Abstract

We consider a small set of axioms for income averaging -- recursivity, continuity, and the boundary condition for the present. These properties yield a unique averaging function that is the density of the reflected Brownian motion with a drift started at the current income and moving over the past incomes. When averaging is done over the short past, the weighting function is asymptotically converging to a Gaussian. When averaging is done over the long horizon, the weighing function converges to the exponential distribution. For all intermediate averaging scales, we derive an explicit solution that interpolates between the two.

Full PDF

OOn Vickrey’s Income Averaging

Stefan Steinerberger ∗ Yale University

Aleh Tsyvinski

Yale University † April 15, 2020

Abstract

We consider a small set of axioms for income averaging – recursivity,continuity, and the boundary condition for the present. These proper-ties yield a unique averaging function that is the density of the reﬂectedBrownian motion with a drift started at the current income and movingover the past incomes. When averaging is done over the short past, theweighting function is asymptotically converging to a Gaussian. Whenaveraging is done over the long horizon, the weighing function convergesto the exponential distribution. For all intermediate averaging scales,we derive an explicit solution that interpolates between the two. ∗ S.S. is supported by the NSF (DMS-1763179) and the Alfred P. Sloan Foundation. † We thank Hector Chade, Kjetil Storesletten, Georgii Riabov, Florian Scheuer, andPhilipp Strack for useful discussions. a r X i v : . [ ec on . T H ] A p r Introduction

How to average over the past is one of the most basic questions that arises in avariety of economic ﬁelds. Our particular focus is on a classic public economicissue – how to average income for the tax purposes – but the answer is broadlyapplicable to many other topics. The question of income averaging is typicallyattributed to Vickrey (1939) and can be summarized as follows. Consider aprogressive tax system and two taxpayers with the same income over a periodof time. The person with the constant income pays a lower total amountof taxes than the person with the ﬂuctuating income. Averaging income toequalize the tax burden then may be desirable. In this paper, we want to abstract from the desirability of averaging. Ourgoal is to formalize the question of averaging and to propose a small set ofreasonable axioms that an averaging mechanism may possess. Given this setof the axioms, we are interested in what averaging functions may arise. Morebroadly, the question of averaging over the past, given a small set of assump-tions, may be of use in a variety of other applications such as behavioraleconomics or dynamic contracting.We chose two main axioms that the averaging function that weights in-come at diﬀerent periods should satisfy. The ﬁrst is recursivity – averagingshould treat various scales of smoothing in a uniﬁed way. In this context, theassumption implies that averaging at some scale and then at another scale isequivalent to averaging at the combined scale. This condition ensures thatall these outcomes agree: there is no diﬀerence in averaging over, say, a year,twelve units of a month or 52 weeks. Alternatively, one can think of this as-sumption as stating that no scale of averaging is singled out and all of them aretreated equally. It is natural that a reasonable multiscale averaging methodhas the scales connected with this intrinsic compatibility condition. The sec-ond is a continuity or locality assumption that requires that the very distant See also Simons (1938). A related issue is the choice of the reference period for taxation. Most commonly thetaxes are assessed on the annual basis. However, in principle, a government may use ashorter or a longer accounting period. The density of such process that deﬁnes a weighting function is known ina simple closed form. From the point view of present, the averaging functionhas particularly meaningful properties. Consider averaging over the very shortperiod in the past. In this case, the drift does not have any substantial eﬀectand the averaging is done mainly via the normal density with the nonzeromean determined by the drift. Consider now averaging over the long horizon.A remarkable fact in probability theory is that for any constant positive drift,the weighting function converges to an exponential. For all the averagingscales in between these two, there is an explicit solution that interpolates In fact, we do not have to run average on the pre-tax income and could rather considersmoothing or averaging the tax contributions or the post tax income directly. Our results nest andinterpolate between the results on averaging obtained in these two approaches.

The question of Vickrey’s income averaging is classic and familiar to any stu-dent of public ﬁnance. While this mechanism is not widely used in ﬁscalpractice today (there are some provisions for income averaging for artists andfarmers), it was extensively employed in the past. Great Britain applied a pro-gressive tax schedule to the average of the individual income of the previousthree years from 1799 to 1926. Between 1923 and 1938 Australia used a ﬁve-year moving average of income over the past ﬁve years (Holt 1949). Gordonand Wen (2018) describe a more recent experience the summary of which wepresent: The United States introduced general income averaging in 1964 and itwas repealed in the 1986 Tax Reform Act. In Canada, a similar policy to thatin the United States was introduced in 1972 together with forward averagingof the income-averaging annuity contracts and had been in eﬀect until 1988.The impact of progressive tax rates on realized capital gains was one reasonfor setting of low tax rates on capital gains. There are also several prominentrecent proposals to reintroduce income averaging. Batchelder (2003) proposestargeted averaging for the poor in the context such as EITC in the US. InCanada, Mintz and Wilson (2002) for primarily retirement savings plans and The literature on total positivity that builds on this work (Karlin 1968) is used moreextensively in economics. The most comprehensive is Gordon and Wen (2018) which alsocontains a review of the older literature on the topic. For the Canadian data,they ﬁnd that while ﬂuctuation penalty is low on average, 10 percent of tax-payers faced annual tax penalties of 1 percentage point of their income and1 percent of taxpayers paid 4 percentage points. That is, those in the top 1percentile of the penalty paid 4 percent of their average annual income morein taxes per year if they had been able to perfectly average. The highestpercentile is composed largely of the self-employed or those with the realizedcapital gains. What is more, 57 percent of taxpayers located between the 95thand 100th percentiles of the penalty are from the bottom income quintile. Sim-ilarly, in the US, Batchelder (2003) ﬁnds that families the bottom quartile offamilies ranked by the annual income faced an additional eﬀective tax rate of2.0 percentage points higher under annual income measurement than it wouldbe if income were fully averaged, whereas for the top quartile’s rate it is only0.5 percentage points higher. Bargain, Trannoy, and Paciﬁco (2017) examine In terms of the assessment of the practical implementation of income averaging, thereis an extensive literature in law (see, e.g., a summary in Buchanan 2005). There are two related answers in the mathematical literature to the ques-tion of averaging.The ﬁrst is given by the literature on scale-space in mathematical imagingand vision analysis (see e.g., books by Aubert and Kornprobst 2006 and Lin-deberg 2013). This literature studies image representations at various scales– from the ﬁnest scale that represents the original image to the coarser scalesof the smoothed versions of the images. Smoothing is conducted at variousscales which are tightly related to each other. The analysis there is mainlyconcerned with two-sided averages and derives a deep and substantial result– a Gaussian kernel arises as a unique averaging object based on a small setof assumption when averaging is done over the whole line. The Gaussian ker-nel has many properties and appears in a variety of ﬁelds of mathematics: inthe scale-space literature it is derived primarily using two main assumptions:varying forms of smoothing and the semigroup (recursivity) structure. Forexample, Lindeberg (1997) writes “A notable coincidence between the diﬀer-ent scale-space formulations that have been stated is that the Gaussian kernelarises as a unique choice for a large number of diﬀerent combinations of under-lying assumptions (scale-space axioms).” This is very important point: givena variety of reasonable assumptions you make (there is quite a lot of naturalthings one could assume), you usually end up with the Gaussian. At the sametime, the one-sided question of averaging over the past is much less studiedand the answer is less canonical in that literature. Saez (2002) considers the question for understanding the optimal period for computingthe time liability. The literature identiﬁed also some other possible results where under diﬀerent assump-tions one may get the stable distributions from probability theory (see e.g., Pauwels, VanGool, Fiddelaers, and Moons 1995) or a nonlinear diﬀusion (see e.g., Alvarez, Guichard,Lions, and Morel 1993) as the averaging principle – yet the Gaussian is a canonical andmost widely used answer. See Fagerstr¨om (2005), Lindeberg and Fagerstr¨om (1996), Lindeberg (2011), and Salden,ter Haar Romeny, and Viergever (1998). The sec-ond assumption that is imposed is monotonicity where the more recent past isweighed heavier than the more distant path. Steinerberger (2019) shows thattotal positivity and monotonicity on the half line lead to the unique weightinggiven by the exponential distribution. That is, the “exponential smoothing”classical in time series analysis (Brown 1957 and Holt 1957 ) arises from asmall set of axioms.Our result can generate the conclusions of both of these approaches from areasonable set of assumptions as well as a range of intermediate results. Thisleads to the weighting scheme that interpolates between the Gaussian (similarin spirit to the scale-space theories) for a short horizon of averaging and theexponential smoothing for a positive drift case if one considers the long timehorizon.

In this section, we deﬁne a question that we aim to study. Let a boundedmeasurable income function f : R → R be deﬁned on time x ∈ ( −∞ , ∞ ).That is, f ( x ) is income at time x . We are interested, at a given time, inﬁnding the average of the income in the past. We want this process to betranslation invariant: the way we average over the past should not depend onwhether it is, for example, currently January or July. Moreover, we wouldlike the process to be linear in the income: the sum of two averaged incomesshould be the average of the sum of the two incomes. The canonical setting Karlin (1968) shows the unique averaging kernel that is variation diminishing and is asemigroup on the whole line is the Gaussian. g ( x ) = (cid:90) ∞ f ( x − y ) h ( y ) dy, where h : [0 , ∞ ) → R is a (not necessarily continuous) weighting function,assigning weight h ( y ) to the income y units of time in the past. Many diﬀerentweighting functions are conceivable. For example, one could simply averagethe income incurred over the last a units of time - this would correspond to h being a step function on [0 , a ] having constant value 1 /a .Our main question is what kind of averaging functions would arise givena small number of reasonable axioms. We emphasize that while we give onepossible answer to this question under reasonable axiomatic assumptions, webelieve that this question is well worth of further study. In particular, it isquite conceivable that other sets of assumptions would lead to other naturalfunctions h ( y ). We recall that in the mathematical imaging literature, theGaussian arises naturally from a wide variety of very diﬀerent assumptions. Nosuch analogous way of forming averages seems to be known for the cases of one-sided averages; both the one-sided Gaussian and the exponential distributionappear in very diﬀerent settings but the existing literature is very sparse andnot as comprehensive as the two-sided case. We believe this to be an excitingavenue for further work.The second question, that we are interested in, is the issue of scale. Wewould like the weighting function and averaging to apply at diﬀerent scales.That is, we want the function h to be in fact a family of functions parame-terized by a scale parameter t . Intuitively, the scale parameter measures boththe range of averaging (the eﬀective length of the time period of averaging)and, as it will turn out, the degree of smoothing. The reason for consideringthis parameter is that income ﬂuctuations may occur at diﬀerent time scales– from the weekly earnings of a restaurant worker to multi-year royalties ofa songwriter. Having a range of scale parameters allows to consider diﬀerent That is, considering longer intervals allows for overall smoother averages – if I take mydaily income, it may ﬂuctuate signiﬁcantly; if I average over a week, it will be smoother;and if I average over a year it will be smoother still. Without loss of generality, we ﬁx the initial time to 0 and call it the present.Let x ∈ ( −∞ ,

0] denote some time in the past and f ( x ) denote income at time x . We further introduce another parameter – scale t . We are interested inthe transformations of the income function f ( x ) x ∈ ( −∞ , at diﬀerent scales t : u ( t, x ). The function u ( t, x ) is the smoothed income at time x where the scaleof smoothing is t . Speciﬁcally, u ( t, x ) = (cid:90) −∞ f ( y ) p t ( x, y ) dy, (1)where p t ( x, y ) ≥ The scale t in what follows also determines the intensity of smoothing. Fora given scale t , this operation takes an initial income function f and transformsit into a new function u ( t, x ) x ∈ ( −∞ , by convolving with the function p t . Thatis, for a given x , u ( t, x ) is a weighted average of incomes f ( y ) with the weights p t ( x, y ). We are particularly interested in the value of smoothed income at thepresent time x = 0 at various scales t : g (0) = u ( t,

0) = (cid:90) −∞ f ( y ) p t (0 , y ) dy. This is exactly the foundations of the scale-space theory in mathematical imaging andvision where an image has to be represented at diﬀerent scales simultaneously – from theminute details at the close inspection to the outlines of the main features when viewed froma distance (see, e.g., Koenderink 1994 and Lindeberg 2013). Instead of making the assumptions on the form of (1) we could have more abstractlyconsidered a family of linear operators T t acting on bounded and continuous functions f ( x ) x ∈ ( −∞ , . Riesz-Markov-Kakutani representation theorem implies that a (positive) con-tinuous linear functional f → T t f is represented by a measure T t f ( x ) = (cid:82) −∞ f ( y ) P t ( x, dy ).Further assuming T t P t ( x, dy ) is a probability measure. Main Assumptions

At this stage, the weighting function p t ( x, y ) and the corresponding smoothedincome u ( t, x ) can be very general, and we now state further assumptions thatallow us to speciﬁcally determine it.A natural condition, that is often not even mentioned, is that if the incomefunction is constant, f ( x ) ≡ c , then the averaged income function should alsobe constant and equal to the same numerical value. We also normalize theweighting function: for all x, t (cid:90) −∞ p t ( x, y ) dy = 1 . The natural averaging after 0 units of time have passed is to simply returnthe original value at the point since nothing has yet happened. We assume thenthe initial condition that p ( x, · ) = δ x , where δ x is the Dirac delta function. Assumption 1. [Recursivity]

For any x, y ∈ ( −∞ , and t, s ≥ , p t + s ( x, y ) = (cid:90) −∞ p s ( x, z ) p t ( z, y ) dz. This is a natural assumption that connects diﬀerent scales of averaging. It is similar to many other recursive formulations common in economics. Inthis context, the assumption implies that averaging at the scale s and then atthe scale t is equivalent to averaging at scale t + s . Alternatively, one can thinkof this assumption as stating that no scale of averaging is singled out and allof them are treated equally. One could thus interpret it as a statement aboutthe internal consistency: if such an averaging method were to be implemented,then a citizen could conceivably ask to have their income averaged twelve timesover the scale of a month as well as over the scale of a year and then pick themore favorable outcome. This condition ensures that all these outcomes agree:there is no diﬀerence in averaging over a year, twelve units of a month or 52 More abstractly, we could have posed the semi-group property of the operator ( T t ) : T f = f, T s ◦ T t = T s + t , for all s, t ≥ Assumption 2. [Locality]

For each x ∈ ( −∞ , and ε > (cid:90) | y − x | >ε p t ( x, y ) dy = o ( t ) . Furthermore, there exist the inﬁnitesimal characteristics a and r : (cid:90) | y − x |≤ ε ( y − x ) p t ( x, y ) dy = rt + o ( t ) , (cid:90) | y − x |≤ ε ( y − x ) p t ( x, y ) dy = at + o ( t ) . In essence, this assumption states a form of continuity for the averagingoperator that for a given x , only the local values y matter for the resultingaverage (for small t ). There is also an additional assumption built in here– time and scale independence of the coeﬃcients – which we chose not tostate separately. We could have instead assumed that a ( t, x ) and r ( t, x ) withthe results straightforwardly extending. There is also one symmetry and,without loss of generality, we can set a = 1. There is also a probabilistic interpretation of this assumption. One canthink of the weight p t ( x, y ) as the probability that a process travels from x to y in time t . The ﬁrst part of the above assumption then states that theprobability of non-local jumps is vanishingly small in time. This assumptiontogether with the recursivity assumption then ensures the continuity of thepaths of the stochastic process (see, e.g., Feller 1954).Finally, we need an assumption on the behavior of the weighting function p t ( x, y ) at the boundary. There are two canonical ways of dealing with a More broadly, it may be of interest to also incorporate some assumptions related to thetime-value of money which would determine the value of r ( x ). Diﬀerent a would correspond to speeding up the time; however, since time will actuallybe one of the parameters in our solution formula, it can be recovered from there. x = 0. This leavesNeumann conditions as a natural choice: Assumption 3. [Boundary Conditions]

For any y < , ∂∂x p t ( x, y ) (cid:12)(cid:12) x =0 = 0 . Remark.

Finally, it is important to note that clearly there are many possiblechoices for the assumptions on averaging. One goal of our work is to formal-ize the question and open the venues to exploring possibly other choices ofassumptions.

We now state the result characterizing the weighting function p t ( x, y ). Proposition.

The evolution equation that follows from Assumptions 1-3 is ∂∂t p t ( x, y ) = 12 ∂ ∂x p t ( x, y ) + r ∂∂x p t ( x, y ) . (2) Furthermore, p t ( x, y ) has an explicit closed form given by the probability dis-tribution function of the Reﬂected Brownian Motion: p t ( x, y ) = 2 re ry Φ (cid:18) rt + x + y √ t (cid:19) + 1 √ t φ (cid:18) − rt − x + y √ t (cid:19) + e ry √ t φ (cid:18) rt + x + y √ t (cid:19) , (3) Also Dirichlet condition contradicts the fact that the total mass is preserved. There are also Robin conditions that one could impose, these are of the form p t ( x, y ) + α ∂∂x p t ( x, y ) (cid:12)(cid:12) x =0 = 0 for some ﬁxed α ∈ R . However, since this also involves the value of the function at the boundary (the quantity ofinterest), it seems unnatural to force it to be of any particular form. here r ∈ R , x, y < , φ is the probability density function of the standard N (0 , Gaussian distribution, and Φ is its cumulative density function. The ﬁrst part of the result follows from the classic paper of Kolmogorov(1931) on the connection of the diﬀusion processes with the second orderparabolic partial diﬀerential equations. The assumptions of the recursivity(semigroup) and locality (continuity) assure that the associated process is adiﬀusion with the density characterized by (2). This particular partial diﬀerential equation 2 is actually quite simple and iseasy to solve on the whole line R . What is diﬀerent in our setting is that we areworking on the half-line R ≤ and have reﬂecting boundary conditions which ismore challenging. For any ﬁxed x < t >

0, we can interpret p t ( x, · )as a probability distribution function. This probability distribution describesthe position of a Brownian particle started in x and moving with constantdrift in direction r (which points towards the origin for r > r < Z t is deﬁned as follows– this is the Skorohod reﬂection problem (Skorohod 1961, 1962). Let ( B t ) t ≥ be a Brownian motion, and consider the process X t = x + rt + B t . There exists a unique increasing continuous function L t such that L = 0 ,Z t = X t − L t ≤ L t grows only at the points where Z t = 0 . Precisely, L t = sup ≤ s ≤ t X + s . The second part of the result and the explicit form of weighting p t ( x, y )function in (3) follows from the results in the queueing theory of Harrison(2013, p. 48) and Glynn and Wang (2018). In fact, one would need a weaker set of assumptions to ensure that p t ( x, y ) is representedby a second-order diﬀerential equation. Speciﬁcally, assuming that the operator T t is a semi-group, preserves unity ( T t f ≥ ⇒ T t f ≥

0) would suﬃce. Thiscan be proven modifying Stroock (2008, Lemma 1.1.6, p.2) and is a consequence of a moregeneral result of Peetre (1959) that local operators are diﬀerential operators of ﬁnite order. tu ( t, x ) = (cid:90) −∞ p t ( x, y ) f ( y ) dy is the solution u ( t, y ) of the initial-boundary value problem ∂∂t u ( t, x ) = 12 ∂ ∂x u ( t, x ) + r ∂∂x u ( t, x ) in ( −∞ , ,∂∂x u ( t,

0) = 0; u (0 , x ) = f ( x ) . In particular, the primary object of our interest – smoothed income at thepresent time ( x = 0) at scale t is given by g (0) = u ( t,

0) = (cid:90) −∞ p t (0 , y ) f ( y ) dy This is the setting that we originally set out to study, and we have identiﬁedan averaging function h ( t, y ) = p t (0 , y ).We now derive two important properties of the probability distributionfunction function p t ( x, y ) – the behavior at the small and large scales. Lemma.

We have:(1) for any ﬁxed x < and y < p t ( x, y ) ∼ √ t φ (cid:18) y − x − rt √ t (cid:19) , t → . (2) if r > , then, for all x < and y < , we have lim t →∞ p t ( x, y ) = 2 re ry , if r < , then p t ( x, · ) converges to 0 on every compact interval as t → ∞ .Proof. Part (1): Smoothing at small scale: t → x, y < p t ( x, y ) √ t φ (cid:16) y − x − rt √ t (cid:17) = 1 + √ t re ry Φ (cid:16) y + x + rt √ t (cid:17) φ (cid:16) y − x − rt √ t (cid:17) + e ry φ (cid:16) y + x + rt √ t (cid:17) φ (cid:16) y − x − rt √ t (cid:17) . We observe that, as t → φ (cid:16) y + x + rt √ t (cid:17) φ (cid:16) y − x − rt √ t (cid:17) = exp (cid:18) ( y − x − rt ) − ( y + x + rt ) t (cid:19) = exp (cid:18) − y ( x + rt ) t (cid:19) → t → (cid:16) y + x + rt √ t (cid:17) φ (cid:16) y − x − rt √ t (cid:17) = e ( y − x − rt )22 t (cid:90) y + x + rt √ t −∞ e − u / du ≤ (cid:114) π (cid:18) ( y − x − rt ) − ( y + x + rt ) t (cid:19) → . So, for all x, y < t → p t ( x, y ) √ t φ (cid:16) y − x − rt √ t (cid:17) → . and the equivalence p t ( x, y ) ∼ √ t φ (cid:16) y − x − rt √ t (cid:17) , t → , is established.Part (2): Smoothing at large scale: t →∞ . This result shows that for thepositive drift (towards the origin) at large scale t → ∞ , the function p t ( x, y )converges to a universal (not depending on time x ) limiting object p t ( x, y ) → re ry which is a negative of the exponential distribution. We consider the steady-15tate for the Kolmogorov forward equation:12 f (cid:48)(cid:48) ( y ) = rf (cid:48) ( y ) . Clearly, f ( y ) = Ae ry + B Boundary condition f (cid:48) (0) = 2 rf (0) implies that B = 0 . Normalization to themass equal to one gives A = 2 r. The lemma above has the following meaning. Part (1) considers averagingover small time, that is, over a very short eﬀective range. The main idea is thatwe average over very short windows of time, the boundary condition has noeﬀect, the drift is still presented and we get averaging with the (non-centered)Gaussian distribution). Part (2) is in fact a rather remarkable fact in probabil-ity theory. The dynamical situation is as follows: we have a Brownian particleon R ≤ that is reﬂected at the origin. A particle such as this would slowly startexploring the space and be spread out more and more (roughly at scale ∼ √ t after t units of time). Without the drift r (or with the drift away from zero, r < t → ∞ , the probability distributionfunction of Brownian motion goes to 0 (because the particles are spread outmore and more). Here (when r > r ) moving the particles back to the origin. As time becomes large there is alimiting proﬁle resulting as the balance of two forces: the constant drift tryingto move everything to the origin and the Brownian particle moving aroundrandomly. This limiting proﬁle is given by the exponential distribution (whichwe encountered previously for very diﬀerent reasons). The parameter r > x = 0 the more recent observationsreceive a higher weight.The explicit form for the density p t ( x, y ) in equation (3) thus gives an inter-polation of the weighting function between the Gaussian and the exponential16istribution. We plot it in Figure 1.Figure 1: Weighting function p t (0 , y ) interpolates between Gaussian and ex-ponential distributions ( r > g ( x ) at diﬀerent scales t . Higher scalesimply a larger eﬀective range of averaging and result in smoother proﬁle ofincome. Figure 2: Smoothed income, g ( x ), at diﬀerent scales t .Finally, we are interested in the smoothing properties of the weightingfunction that we found. Speciﬁcally, we are interested in knowing whether ourequation ∂∂t u ( t, x ) = 12 ∂ ∂x u ( t, x ) + r ∂∂x u ( t, x )17rises as a gradient ﬂow on some space. The gradient ﬂow is the analogue ofthe usual gradient descent process but for the space of the functions. That is,we evolve the whole function in the direction of the steepest increase in someobjective function. Consider the Hilbert space L (( −∞ , , µ ) , where µ ( dx ) = e rx dx. Let M be the subspace consisting of all continuously diﬀerentiable functions u :( −∞ , → R , such that u (cid:48) (0) = 0 . On M we deﬁne a functional I ( u ) = (cid:90) −∞ u x dµ, where µ = e rx dx .The corresponding gradient ﬂow is a function u : [0 , ∞ ) → M such that ∂ t u = −∇ I ( u ) , where ∇ I is the gradient of the functional I. We compute thegradient: I ( u + (cid:15)v ) − I ( u ) = (cid:15) (cid:90) −∞ u (cid:48) ( x ) v (cid:48) ( x ) e rx dx + o ( (cid:15) ) . So, ( ∇ I ( u ) , v ) = 12 (cid:90) −∞ u (cid:48) ( x ) v (cid:48) ( x ) e rx dx = − (cid:90) −∞ v ( x )( u (cid:48)(cid:48) ( x ) + 2 ru (cid:48) ( x )) e rx dx. It follows that ∇ I ( u ) = − u (cid:48)(cid:48) ( x ) − ru (cid:48) ( x ) , and the gradient ﬂow coincides with the equation ∂∂t u ( t, x ) = 12 ∂ ∂x u ( t, x ) + r ∂∂x u ( t, x ) . Let us consider u ( t + ε, x ) = u ( x ) + εv ( x ). We then construct the ﬂow in See Steinerberger and Tsyvinski (2019) for a detailed description of gradient ﬂows arisingin the context of optimal taxation. I. From this point of view, the PDEsmoothes functions as it reduces the L (( −∞ , , µ )-norm of the derivative in x ) (cid:90) −∞ ( u x + εv x ) dµ = (cid:90) −∞ ( u x + εv x ) e rx dx = (cid:90) −∞ u x e rx dx + 2 ε (cid:90) −∞ u x v x e rx dx + O ( ε )At the same time, integration by parts (and applying Neumann conditions onthe boundary) results in2 ε (cid:90) −∞ u x v x e rx dx = − ε (cid:90) −∞ v ∂∂x (cid:0) u x e rx (cid:1) dx. We have ∂∂x (cid:0) u x e rx (cid:1) = u xx e rx + 2 u x re rx = ( u xx + 2 ru x ) e rx Therefore, − ε (cid:90) −∞ v ∂∂x ( u x e rx ) dx = − ε (cid:90) −∞ v ( u xx + 2 ru x ) e rx dx = − ε (cid:90) −∞ v ( 12 u xx + ru x ) dµ By L − duality (or Cauchy-Schwarz), this quantity is made as small as possiblewhen v = 12 u xx + ru x . That is, u xx + ru x is the gradient ﬂow that maximally smoothes incomein the sense of maximally decreasing the present value of the sum of u .19 Conclusion

We examine a classic public ﬁnance question from a new perspective andpropose an averaging rule based on a small set of assumptions.Anticipating potential criticism, we now address some of the broad issueswith this approach. First, the assumptions that we used, while reasonable, arecertainly not the only ones one can use and, hence, derive a diﬀerent averagingand smoothing rule. A good parallel to make is a discussion in the mathemat-ical imaging literature that examines how various sets of assumptions generatediﬀerent smoothing mechanisms. Moreover, the focus there is exactly the onewe take here – how a small set of assumptions generate reasonable results andwhat the consequences are of relaxing or changing some of those. In particu-lar, we believe it could be quite desirable to have the same question addressedfrom various diﬀerent perspectives and see what kind of averaging methodsmay arise from completely diﬀerent sets of axioms. A fascinating question iswhether the universal appearance of the Gaussian in the two-sided case hasan analogous " universal " averaging scheme. Both the half-sided Gaussian andthe exponential distribution are natural candidates. Second, the question ofthe practicality of the results. While the exponential weighting, Gaussian andthe explicit form of the density of the reﬂected Brownian motion for the inter-mediate case are simple mathematically, it is more diﬃcult (with the exceptionof the exponential case) to imagine them being implemented in practice. Theabstract formulation of the problem and the explicit solution that we deriveallow, however, to both precisely state and solve the question rather than relyon the perhaps more useful heuristics. At the same time, with increased dig-itization one can imagine that some of the theoretical insights presented hereto be implemented in practice. This is the main point of an excellent discus-sion of the broad range of practical implementation topics of the theoreticaltaxation literature, including income averaging, in Jacobs (2017).20 eferences Alvarez, Luis, Frederic Guichard, Pierre-Louis Lions, and Jean-Michel Morel. " Axioms and fundamental equations of image processing. " Archive for Ratio-nal Mechanics and Analysis 123, no. 3: 199-257. 1993.Aubert, Gilles, and Pierre Kornprobst. Mathematical problems in imageprocessing: partial diﬀerential equations and the calculus of variations. Vol.147. Springer Science & Business Media. 2006.Bargain, Olivier, Alain Trannoy, and Adrien Paciﬁco. " The Impact of TaxFrequency: Theoretical and Empirical Investigations " . Working paper. 2017.Batchelder, Lily L. " Taxing the poor: Income averaging reconsidered. " Harvard Journal on Legislation. 40: 395. 2003.Brown, Robert G. " Exponential smoothing for predicting demand. " In Op-erations Research, vol. 5, no. 1, pp. 145-145. 1957.Buchanan, Neil H. " The Case Against Income Averaging. " Virginia TaxReview. 25: 1151. 2005.Diamond, Peter. " Taxes and pensions. " Southern Economic Journal 76,no. 1 (2009): 2-15Fagerstr¨om, Daniel. " Temporal scale spaces. " International Journal ofComputer Vision 64, no. 2-3: 97-106. 2005.Feller, William. " Diﬀusion processes in one dimension. " Transactions ofthe American Mathematical Society 77, no. 1: 1-31. 1954.Glynn, Peter W., and Rob J. Wang. " On the rate of convergence to equi-librium for reﬂected Brownian motion. " Queueing Systems 89, no. 1-2 (2018):165-197.Gordon, Daniel V., and Jean-Francois Wen. " Tax penalties on ﬂuctuat-ing incomes: estimates from longitudinal data. " International Tax and PublicFinance 25, no. 2: 430-457. 2018.Harrison, J. Michael. Brownian models of performance and control. Cam-bridge University Press, 2013.Heathcote, Jonathan, Kjetil Storesletten, and Giovanni L. Violante. " Con-21umption and labor supply with partial insurance: An analytical framework. " American Economic Review 104, no. 7: 2075-2126. 2014.Holt, Charles C. " Averaging of income for tax purposes: Equity and ﬁscal-policy considerations. " National Tax Journal 2, no. 4 : 349-361. 1949.Holt, Charles C. “Forecasting seasonals and trends by exponentially weightedmoving averages” Oﬃce of Naval Research. Research Memorandum No. 52.,1957. and reprinted in Holt, Charles C. " Forecasting seasonals and trends byexponentially weighted moving averages. " International journal of forecasting20, no. 1: 5-10. 2004.Huggett, Mark, and Juan Carlos Parra. " How well does the US socialinsurance system provide social insurance?. " Journal of Political Economy 118,no. 1: 76-112. 2010.Jacobs, Bas. " Digitalization and Taxation " , in: Sanjeev Gupta, MichaelKeen, Alpa Shah, and Genevieve Verdier (eds), Digital Revolutions in PublicFinance, Washington-DC: International Monetary Fund, Ch. 2. 2017.Karlin, Samuel. Total positivity. Vol. 1. Stanford University Press. 1968.Kapicka, Marek. " Quantifying the Welfare Gains from History DependentIncome Taxation. " ADEMU Working paper series. 2017.Kolmogorov, Andrei. " ¨Uber die analytischen Methoden in der Wahrschein-lichkeitsrechnung. " Mathematische Annalen 104, no. 1: 415-458. 1931.Koenderink, Jan J. " The structure of images. " Biological cybernetics 50,no. 5: 363-370. 1984.Lindeberg, Tony. " On the axiomatic foundations of linear scale-space. " InGaussian scale-space theory, pp. 75-97. Springer, Dordrecht. 1997.Lindeberg, Tony. " Generalized Gaussian scale-space axiomatics comprisinglinear scale-space, aﬃne scale-space and spatio-temporal scale-space. " Journalof Mathematical Imaging and Vision 40, no. 1: 36-81. 2011.Lindeberg, Tony. Scale-space theory in computer vision. Vol. 256. SpringerScience & Business Media. 2013.Lindeberg, Tony, and Daniel Fagerstr¨om. " Scale-space with casual time di-rection. " In European Conference on Computer Vision, pp. 229-240. Springer,Berlin, Heidelberg. 1996. 22intz, Jack M., and Thomas A. Wilson. " Saving the future: restoring fair-ness to the taxation of savings. " Commentary-CD Howe Institute 176. 2002.Mirrlees, James A., and Stuart Adam. Dimensions of tax design: theMirrlees review. Oxford University Press, 2010.Pauwels, Eric J., Luc J. Van Gool, Peter Fiddelaers, and Theo Moons. " An extended class of scale-invariant and recursive scale space ﬁlters. " IEEETransactions on Pattern Analysis and Machine Intelligence 17, no. 7: 691-701.1995.Peetre, Jaak. " Une caracterisation abstraite des operateurs diﬀerentiels. " Mathematica Scandinavica: 211-218. 1959.Salden, Alfons H., Bart M. ter Haar Romeny, and Max A. Viergever. " Linear scale-space theory from physical principles. " Journal of Mathemat-ical Imaging and Vision 9, no. 2: 103-139. 1998.Saez, Emmanuel. " Optimal income transfer programs: intensive versusextensive labor supply responses. " The Quarterly Journal of Economics 117,no. 3 1039-1073. 2002.Skorokhod, Anatoliy V. " Stochastic equations for diﬀusion processes in abounded region. " Theory of Probability & Its Applications 6, no. 3: 264-274.1961.Skorokhod, Anatoliy V. " Stochastic equations for diﬀusion processes in abounded region. II. " Theory of Probability & Its Applications 7, no. 1: 3-23.1962.Sch¨onberg, Isaac Jacob. " On variation-diminishing integral operators ofthe convolution type. " Proceedings of the National Academy of Sciences ofthe United States of America 34, no. 4, p. 164-169. 1948.Simons, Henry C. Personal income taxation: The deﬁnition of income as aproblem of ﬁscal policy. Chicago University, Chicago. 1938.Steinerberger, Stefan and Aleh Tsyvinski. “Tax mechanisms and gradientﬂows”. National Bureau of Economic Research Working paper, No. w25821.2019.Stroock, Daniel W. Partial diﬀerential equations for probabilists. Cam-bridge Univ. Press. 2008. 23ickrey, William. " Averaging of income for income-tax purposes. " Journalof Political Economy 47, no. 3: 379-397. 1939.Weickert, Joachim, Seiji Ishikawa, and Atsushi Imiya. " On the history ofGaussian scale-space axiomatics. ""