[PDF] Non-Additive Axiologies in Large Worlds

Abstract

Is the overall value of a world just the sum of values contributed by each value-bearing entity in that world? Additively separable axiologies (like total utilitarianism, prioritarianism, and critical level views) say 'yes', but non-additive axiologies (like average utilitarianism, rank-discounted utilitarianism, and variable value views) say 'no'. This distinction is practically important: additive axiologies support 'arguments from astronomical scale' which suggest (among other things) that it is overwhelmingly important for humanity to avoid premature extinction and ensure the existence of a large future population, while non-additive axiologies need not. We show, however, that when there is a large enough 'background population' unaffected by our choices, a wide range of non-additive axiologies converge in their implications with some additive axiology -- for instance, average utilitarianism converges to critical-level utilitarianism and various egalitarian theories converge to prioritiarianism. We further argue that real-world background populations may be large enough to make these limit results practically significant. This means that arguments from astronomical scale, and other arguments in practical ethics that seem to presuppose additive separability, may be truth-preserving in practice whether or not we accept additive separability as a basic axiological principle.

Full PDF

aa r X i v : . [ ec on . T H ] O c t Non-Additive Axiologies in Large Worlds

Christian Tarsney and Teruji Thomas ∗ Version 1.0, Sept 2020(Latest version here.)

Abstract

Is the overall value of a world just the sum of values contributedby each value-bearing entity in that world?

Additively separable ax-iologies (like total utilitarianism, prioritarianism, and critical levelviews) say ‘yes’, but non-additive axiologies (like average utilitarian-ism, rank-discounted utilitarianism, and variable value views) say‘no’. This distinction is practically important: additive axiologies sup-port ‘arguments from astronomical scale’ which suggest (among otherthings) that it is overwhelmingly important for humanity to avoidpremature extinction and ensure the existence of a large future pop-ulation, while non-additive axiologies need not. We show, however,that when there is a large enough ‘background population’ unaffectedby our choices, a wide range of non-additive axiologies converge intheir implications with some additive axiology—for instance, aver-age utilitarianism converges to critical-level utilitarianism and vari-ous egalitarian theories converge to prioritiarianism. We further ar-gue that real-world background populations may be large enough tomake these limit results practically signiﬁcant. This means that ar-guments from astronomical scale, and other arguments in practicalethics that seem to presuppose additive separability, may be truth-preserving in practice whether or not we accept additive separabilityas a basic axiological principle.

The world we live in is both large and populous. Our planet, for instance,is 4.5 billion years old and has has borne life for roughly 4 billion of those

Global Priorities Institute, Faculty of Philosophy, University of Oxford. Comments wel-come: [email protected], [email protected]. future is potentially vast: our descen-dants could survive for a very long time, and might someday settle a largepart of the accessible universe, gaining access to a vast pool of resourcesthat would enable the existence of astronomical numbers of beings withdiverse lives and experiences.These facts have ethical implications. Most straightforwardly, the po-tential future scale of our civilization suggests that it is extremely impor-tant to shape the far future for the better. This view has come to be called longtermism , and its recent proponents include Bostrom (2003, 2013), Beckstead(2013, 2019), Cowen (2018), Greaves and MacAskill (2019), and Ord (2020).There are many ways in which we might try to positively inﬂuence the farfuture—e.g., building better and more stable institutions, shaping culturalnorms and moral values, or accelerating economic growth. But one par-ticularly obvious concern is ensuring the long-term survival of our civiliza-tion, by avoiding civilization- or species-ending ‘existential catastrophes’from sources like nuclear weapons, climate change, biotechnology, and ar-tiﬁcial intelligence. Longtermism in general, and the emphasis on exis-tential catastrophes in particular, have major revisionary practical impli-cations if correct, e.g., suggesting the need for major reallocations of re-sources and collective attention (Ord, 2020, pp. 57ff ).All these recent defenses of longtermism appeal, in one way or another,to the astronomical scale of the far future. For instance, Beckstead’s centralargument starts from the premises that ‘Humanity may survive for millions,billions, or trillions of years’ and ‘If humanity may survive may survive formillions, billions, or trillions of years, then the expected value of the futureis astronomically great’ (Beckstead, 2013, pp. 1–2). Importantly for our pur-poses, the astronomical scale of the far future most plausibly results fromthe astronomical number of individuals who might exist in the far future:while the far future population might consist, say, of just a single galaxy-spanning individual, the futures that typically strike longtermists as mostworth pursuing involve a very large number of individuals with lives worth The importance of avoiding existential catastrophe is especially emphasized byBostrom (2003, 2013) and Ord (2020).

Arguments from Astronomical Scale

Because far more welfare subjects or value-bearing entities are affectedby A than by B, we can make a much greater difference to the overallvalue of the world by focusing on A rather than B.Beckstead and other longtermists take this schema and substitute, for in-stance, ‘the long-run trajectory of human-originating civilization’ for A and‘the (non-trajectory-shaping) events of the next 100 years’ for B. To illus-trate the scales involved, Bostrom (2013) estimates that if we manage to set-tle the stars, our civilization could ultimately support at least 10 century-long human lives, or 10 subjectively similar lives in the form of simula-tions. Since only a tiny fraction of those lives would exist in the next cen-tury or millennium, it seems prima facie plausible that even comparativelyminuscule effects on the far future (e.g., small changes to the average wel-fare of the far-future population, or to its size, or to the probability that itcomes to exist in the ﬁrst place) would be vastly more important than anyeffects we can have on the more immediate future. Should we ﬁnd arguments from astronomical scale persuasive? Thatis, does the fact that A affects vastly more individuals than B give us strongreason to believe, in general, that A is vastly more important than B? Al-though there are many possible complications, the sheer numbers makethese arguments quite strong if we accept an axiology (a theory of the valueof possible worlds or states of affairs) according to which the overall valueof the world is simply a sum of values contributed by each individual in thatworld—e.g., the sum of individual welfare levels. In this case, the effect thatsome intervention has on the overall value of the world scales linearly withthe number of individuals affected (all else being equal), and so astronom-ical scale implies astronomical importance.But can the overall value of the world be expressed as such a sum? Thisquestion represents a crucial dividing line in axiology, between axiologiesthat are additively separable (hereafter usually abbreviated ‘additive’) and Thus, for instance, in reference to the 10 estimate, Bostrom claims that ‘if we give thisallegedly lower bound...a mere 1 per cent chance of being correct, we ﬁnd that the expectedvalue of reducing existential risk by a mere one billionth of one billionth of one percentagepoint is worth a hundred billion times as much as a billion human lives’ (Bostrom, 2013, p.19). total utilitarianism claims that the value of a world is simply thesum of the welfare of every welfare subject in that world, and is thereforeadditive. On the other hand, average utilitarianism , which identiﬁes thevalue of a world with the average welfare of all welfare subjects, is non-additive.When we consider non-additive axiologies, the force of arguments fromastronomical scale becomes much less clear, especially in variable-popu-lation contexts (i.e. when comparing possible populations of different sizes).They therefore represent a challenge to the case for longtermism and, moreparticularly, to the case for the overwhelming importance of avoiding exis-tential catastrophe. As a stylized illustration: suppose that there are 10 ex-isting people, all with welfare 1. We can either ( O ) leave things unchanged,( O ) improve the welfare of all the existing people from 1 to 2, or ( O ) cre-ate some number n of new people with welfare 1.5. Total utilitarianism, ofcourse, tells us to choose O , as long as n is sufﬁciently large. But averageutilitarianism—while agreeing that O is better than O and that the larger n is, the better—nonetheless prefers O to O no matter how astronomi-cally large n may be. Now, additive axiologies can disagree with total utili-tarianism here if they claim that adding people with welfare 1.5 makes theworld worse instead of better; but the broader point is that they will almostalways claim that the difference in value between O and O becomes astro-nomically large (whether positive or negative) as n increases—bigger, forexample, than the difference in value between O and O . Non-additive axi-ologies, on the other hand, need not regard O as making a big difference tothe value of the world, regardless of n . Again, average utilitarianism agreeswith total utilitarianism that O is an improvement over O , but regards itas a smaller improvement than O , even when it affects vastly more indi-viduals.Thus, the abstract question of additive separability seems to play a cru-cial role with respect to arguably the most important practical question inpopulation ethics: the relative importance of (i) ensuring the long-termsurvival of our civilization and its ability to support a very large number offuture individuals with lives worth living vs. (ii) improving the welfare ofthe present population.The aim of this paper, however, is to show that under certain circum-stances, a wide range of non-additive axiologies converge in their impli-cations with some counterpart additive axiology. This convergence has a4umber of interesting consequences, but perhaps the most important isthat non-additive axiologies can inherit the scale-sensitivity of their addi-tive counterparts. This makes arguments from astronomical scale less re-liant on the controversial assumption of additive separability. It therebyincreases the robustness of the practical case for the overwhelming impor-tance of the far future and of avoiding existential catastrophe.Our starting place is the observation that, according to non-additive ax-iologies, which of two outcomes is better can depend on the welfare of thepeople unaffected by the choice between them. That is, suppose we arecomparing two populations X and Y . And suppose that, besides X and Y , there is some ‘background population’ Z that would exist either way. ( Z might include, for instance, past human or non-human welfare subjectson Earth, faraway aliens, or present / future welfare subjects who are sim-ply unaffected by our present choice.) Non-additive axiologies allow thatwhether X -and- Z is better than Y -and- Z can depend on facts about Z . With this in mind, our argument has two steps. First, we prove severalresults to the effect that, in the large-background-population limit (i.e., asthe size of the background population Z tends to inﬁnity), non-additiveaxiologies of various types converge with counterpart additive axiologies.Thus, these axiologies are effectively additive in the presence of sufﬁcientlylarge background populations. Second, we argue that the background pop-ulations in real-world choice situations are, at a minimum, substantiallylarger than the present and near-future human population. This providessome prima facie reason to believe that non-additive axiologies of the typeswe survey will agree closely with their additive counterparts in practice.More speciﬁcally, we argue that real-world background populations arelarge enough to substantially increase the importance that average utili-tarianism (and, more tentatively, variable value views) assign to avoidingexistential catastrophe. Thus, our arguments suggest, it is not merely thepotential scale of the future that has important ethical implications, butalso the scale of the world as a whole—in particular, the scale of the back- We follow the tradition in population ethics that ‘populations’ are individuated notonly by which people they contain, but also by what their welfare levels would be. (However,in the formalism introduced in section 2, the populations we’ll consider are anonymous , i.e.the identities of the people are not speciﬁed.) The role of background populations in non-separable axiologies has receivedsurprisingly little attention, but has not gone entirely unnoticed. In particular,Budolfson and Spears (ms) consider the implications of background populations for issuesrelated to the ‘Repugnant Conclusion’ (see §10.1 below). And, as we discovered while revis-ing this paper, an argument very much in the spirit of our own (though without our formalresults) was elegantly sketched several years ago in a blog post by Carl Shulman (Shulman,2014).

All of the population axiologies we will consider evaluate worlds based onlyon the number of welfare subjects at each welfare level. We will consideronly worlds containing a ﬁnite total number of welfare subjects (except in§10.2, where we consider the signiﬁcance of our results for inﬁnite ethics).We will also set aside worlds that contain no welfare subjects, simply be-cause some theories of population axiology, like average utilitarianism, donot evaluate such empty worlds.Thus for our purposes a population is a non-zero, ﬁnitely supportedfunction from the set W of all possible welfare levels to the set Z + of all non-negative integers, specifying the number of welfare subjects at each level.Despite this formalism, we’ll say that a welfare level w occurs in a popula-tion X to mean that X ( w ) =

0. An axiology A is a strict partial order ≻ A onthe set P of all populations, with ‘ X ≻ A Y ’ meaning that population X isbetter than population Y according to A . Almost all the axiologies we willconsider in this paper can be represented by a value function V A : P → R ,meaning that X ≻ A Y ⇐⇒ V A ( X ) > V A ( Y ) .To illustrate this formalism, the size | X | of a population X is simply the6otal number of welfare subjects: | X | : = X w ∈W X ( w ) .Similarly, the total welfare isTot ( X ) : = X w ∈W X ( w ) w .Of course, the deﬁnition of Tot ( X ) only makes sense on the assumptionthat we can add together welfare levels, and in this connection we generallyassume that W is given to us as a set of real numbers. With that in mind,the average welfare X : = Tot ( X ) / | X | is also well-deﬁned. We can now give a precise deﬁnition of additive separability.If X and Y are populations, then let X + Y be the population obtainedby adding together the number of welfare subjects at each welfare level in X and Y . That is, for all w ∈ W , ( X + Y )( w ) = X ( w ) + Y ( w ) . An axiology is separable if, for any populations X , Y , and Z , X + Z ≻ Y + Z ⇐⇒ X ≻ Y .This means that in comparing X + Z and Y + Z , one can ignore the sharedsub-population Z . Separability is entailed by the following more concretecondition: Additivity

An axiology A is additively separable (or additive for short) iff it can berepresented by a value function of the form V A ( X ) = X w ∈W X ( w ) f ( w ) with f : W → R . Thus the value of X is given by transforming the wel-fare of each welfare subject by the function f and then adding up theresults. 7n the following discussion, we will sometimes want to focus on thedistinction between additive and non-additive axiologies, and sometimeson the distinction between separable and non-separable axiologies. Whilean axiology can be separable but non-additive, none of the views we willconsider below have this feature. So for our purposes, the additive / non-additive and separable / non-separable distinctions are more or less exten-sionally equivalent. We will consider three categories of additive axiologies in this paper,which we now introduce in order of increasing generality. First, there is total utilitarianism , which identiﬁes the value of a population with its totalwelfare. Total Utilitarianism ( TU ) V TU ( X ) = Tot ( X ) = X w ∈W X ( w ) w = X | X | .An arguable drawback of TU is that it implies the so-called ‘RepugnantConclusion’ (Parﬁt, 1984), that for any two positive welfare levels w < w ,for any population in which everyone has welfare w , there is a better pop-ulation in which everyone has welfare w . The desire to avoid the Repug-nant Conclusion is one motivation for the next class of additive axiologies, critical-level theories. Critical-Level Utilitarianism ( CL ) V CL ( X ) = X w ∈W X ( w )( w − c ) = Tot ( X ) − c | X | = ( X − c ) | X | for some constant c ∈ W (representing the ‘critical level’ of welfareabove which adding an individual to the population constitutes an im-provement), generally but not necessarily taken to be positive.We sometimes write ‘CL c ’ rather than merely ‘CL’ to emphasize the depen-dence on the critical level. TU is a special case of CL, namely, the case with For a detailed discussion of separability principles in population ethics, see Thomas(forthcoming). Total utilitarianism is arguably endorsed (with varying degrees of clarity andexplicitness) by classical utilitarians like Hutcheson (1738), Bentham (1789), Mill(1863), and Sidgwick (1874), and has more recently been defended by Hudson (1987),de Lazari-Radek and Singer (2014), and Gustafsson (forthcoming), among others. Critical-level views have been defended by Blackorby et al. (1997, 2005), among oth-ers. c =

0. Note that, as long as c is positive, CL avoids the Repug-nant Conclusion since adding lives with very low positive welfare makesthings worse rather than better. Another arguable drawback of both TU and CL is that they give no pri-ority to the less well off—that is, they assign the same marginal value toa given improvement in someone’s welfare, regardless of how well off theywere to begin with. We might intuit, however, that a one-unit improvementin the welfare of a very badly off individual has greater moral value than thesame welfare improvement for someone who is already very well off. Thisintuition is captured by prioritarian theories. Prioritarianism ( PR ) V PR ( X ) = X w ∈W X ( w ) f ( w ) for some function f : W → R (the ‘priority weighting’ function) that isconcave and strictly increasing.CL is a special case of PR where f is linear, and TU is a special case where f is linear and passes through the origin. Note also that our deﬁnition ofthe prioritarian family of axiologies is very close to our deﬁnition of addi-tive separability, just adding the conditions that f is strictly increasing andweakly concave. In this section and the next, we consider two categories of non-additive axi-ologies and show that, in the presence of large enough background popula-tions, they converge with some additive axiology. In this section, we showthat average utilitarianism and related views converge with CL, where thecritical level is the average welfare of the background population. In thenext section, we show that various non-additive egalitarian views convergewith PR.First, though, what do we mean by converging to an additive (or anyother) axiology? The claim makes sense relative to a speciﬁed type of back-ground population, e.g., all those having a certain average level of welfare. But a positive critical level also brings its own, arguably greater drawbacks—e.g., theStrong Sadistic Conclusion (Arrhenius, 2000). Versions of prioritarianism have been defended by Weirich (1983), Parﬁt (1997),Arneson (2000), and Adler (2009, 2011), among others.

Sufﬁcientarianism , which by ourdeﬁnition will count as a special case of prioritarianism, has been defended by Frankfurt(1987) and Crisp (2003), among others. onvergence Axiology A converges to A ′ relative to background populations of type T , if and only if, for any populations X and Y , if Z is a sufﬁciently largepopulation of type T , then X + Z ≻ A ′ Y + Z = ⇒ X + Z ≻ A Y + Z .Of course, if A ′ is separable, the last implication can be replaced by X ≻ A ′ Y = ⇒ X + Z ≻ A Y + Z .We can, in other words, compare X + Z and Y + Z with respect to A bycomparing X and Y with respect to A ′ —if we know that Z is a sufﬁcientlylarge population of the right type.Note two ways in which this notion of convergence is fairly weak. First,what it means for Z to be ‘sufﬁciently large’ can depend on X and Y . Sec-ond, the displayed implication need not be a biconditional; thus, when A ′ does not have a strict preference between X + Z and Y + Z (e.g., when itis indifferent between them), convergence to A ′ does not imply anythingabout how A ranks of those two populations. Because of this, every axiol-ogy converges to the trivial axiology according to which no population isbetter than any other. Of course, such a result is uninformative, and we areonly interested in convergence to more discriminating axiologies. Speciﬁ-cally, we will only ever consider axiologies that satisfy the Pareto principle(which we discuss in §5.1). Average utilitarianism, as the name suggests, identiﬁes the value of a pop-ulation with the average welfare level of that population. Average utilitarianism is often discussed but rarely endorsed. It has its defenders, how-ever, including Hardin (1968), Harsanyi (1977), and Pressman (2015). Mill (1863) can alsobe read as an average utilitarian (see fn. 2 in Gustafsson (forthcoming)), though the textualevidence for this reading is not entirely conclusive.As with all evaluative or normative theories—but perhaps more so than most—averageutilitarianism confronts a number of choice points that generate a minor combinatorial ex-plosion of possible variants. Hurka (1982a,b) identiﬁes three such choice points which gen-erate at least twelve different versions of averagism. The view we have labeled AU (whichHurka calls A1) strikes us as the most plausible, but our main line of argument could beapplied to many other versions. Versions of averagism that only care about the future pop-ulation do present us with a challenge, which we discuss in §7. verage Utilitarianism ( AU ) V AU ( X ) = X = X w ∈W X ( w ) | X | w .Our ﬁrst result describes the behavior of AU as the size of the backgroundpopulation tends to inﬁnity. Theorem 1.

Average utilitarianism converges to CL c , relative to backgroundpopulations with average welfare c . In fact, for any populations X , Y , Z , ifZ = c and | Z | > | X | V CL c ( Y ) − | Y | V CL c ( X ) V CL c ( X ) − V CL c ( Y ) (1) then V CL c ( X ) > V CL c ( Y ) = ⇒ V AU ( X + Z ) > V AU ( Y + Z ) . Proofs of all theorems are given in the appendix. Discussion of this andother results is deferred to §10.

Some philosophers have sought an intermediate position between totaland average utilitarianism, acknowledging that increasing the size of a pop-ulation (without changing average welfare) can count as an improvement,but holding that additional lives have diminishing marginal value . Themost widely discussed version of this approach is the variable value view. It is useful to distinguish two types of this view, the second more generalthan the ﬁrst.

Variable Value I ( VV1 ) V VV1 ( X ) = X g ( | X | ) , where g : Z + → R + is increasing, concave, non-zero,and bounded above. Variable Value II ( VV2 ) V VV2 ( X ) = f ( X ) g ( | X | ) , where f : R → R is differentiable and strictly in-creasing, and g : Z + → R + is increasing, concave, non-zero, and boundedabove.Sloganistically, variable value views can be ‘totalist for small popula-tions’ (where g may be nearly linear), but must become ‘averagist for largepopulations’ (as g approaches its upper bound). It is therefore not entirely These views were introduced by Hurka (1983). Variable Value I is also discussed by Ng(1989) under the name ‘Theory X ′ ’. Theorem 2.

Variable value views converge to CL c relative to backgroundpopulations with average welfare c . For the broad class of variable value views, we cannot give the sort ofthreshold for | Z | that we gave for AU, above which the ranking of X + Z and Y + Z must agree with the ranking given by CL Z . For instance, be-cause g can be any function that is strictly increasing, strictly concave, andbounded above, variable value views can remain in arbitrarily close agree-ment with totalism for arbitrarily large populations, so if TU prefers onepopulation to another, there will always be some variable value theory thatagrees. In the case of VV1, we can say that if both TU and AU prefer X to Y , then all VV1 views will as well (see Proposition 1 in the appendix), andso whenever TU and CL Z have the same strict preference between X and Y , the threshold given in Theorem 1 holds for VV1 as well. For VV2, wecannot even say this much. A second category of non-additive axiologies are motivated by egalitarianconsiderations. Whether adding some individual to a population, or in-creasing the welfare of an existing individual, will increase or decrease equal-ity depends on the welfare of other individuals in the population, so it iseasy to see why concern with equality might motivate separability viola-tions.Egalitarian views have been widely discussed in the context of distribu-tive justice for ﬁxed populations, but relatively little has been said aboutegalitarianism in a variable-population context. We are therefore some-what in the dark as to which egalitarian views are most plausible in that con-text. But we will consider a few possibilities that seem especially promis-ing, trying to consider each fork of two major choice points for variable-population egalitarianism. What we can say about VV2 is the following: when X > Y , | X | ≥ | Y | , and f ( X ) ≥ X to Y . Similarly, when X > Y , | Y | ≥ | X | , and f ( Y ) ≤

0, VV2is guaranteed to prefer X to Y . (These claims depend only on the fact that f is strictlyincreasing and g is increasing.) So in any case where the population preferred by CL Z islarger and has average welfare to which VV2assigns a non-negative value, or the populationdispreferred by CL Z is larger and has average welfare to which VV2 assigns a non-positivevalue, VV2 will agree with CL Z whenever AU does. / ‘pluralistic’egalitarian views, which treat the value of a population as the sum of two(or more) terms, one of which is a measure of inequality, and (ii) ‘rank-discounting’ views, which give less weight to the welfare of individuals whoare better off relative to the rest of the population. These two categoriesof views are extensionally equivalent in the ﬁxed-population context, butcome apart in the variable-population context (Kowalczyk, ms). Among two-factor egalitarian theories, there is another important choicepoint between ‘totalist’ and ‘averagist’ views.

Totalist Two-Factor Egalitarianism V ( X ) = Tot ( X ) − I ( X ) | X | , where I is some measure of inequality in X . Averagist Two-Factor Egalitarianism V ( X ) = X − I ( X ) , where I is some measure of inequality in X . Here, in each case, the second term of the value function can be thoughtof as a penalty representing the badness of inequality. Such a penalty couldhave any number of forms, but for the purposes of illustration we stipulatethat I ( X ) depends only on the distribution of X , where this can be under-stood formally as the function X / | X | : W → R giving the proportion of thepopulation in X having welfare w . The degree of inequality is indeed plau-sibly a matter of the distribution in this sense, and the badness of inequal-ity is then plausibly a function of the degree of inequality and the size ofthe population. The more substantial assumption is that the badness ofinequality either scales linearly with the size of the population (for the to-talist version of the view) or does not depend on population size (for theaveragist version).Now, we want to know what these theories do as | Z | → ∞ . In the lastsection, we had to hold one feature of Z constant as | Z | → ∞ , namely, Z .Egalitarian theories, however, are potentially sensitive to the whole distri-bution of welfare levels in the population, and so to obtain limit results itis useful to hold ﬁxed the whole distribution of welfare in the backgroundpopulation, i.e. D : = Z / | Z | . We’ll state the general result, and then givesome examples. One could also imagine variable-value two-factor theories (and two-factor theoriesthat incorporate critical levels, priority weighting, etc., into their value functions), but wewill set these possibilities aside for simplicity. heorem 3. Suppose V is a value function of the form V ( X ) = Tot ( X ) − I ( X ) | X | , or else V ( X ) = X − I ( X ) , where I is a differentiable function of thedistribution of X . Then the axiology A represented by V converges to anadditive axiology relative to background populations with any given distri-bution D , with weighting function f ( w ) = lim t → + V ( D + t w ) − V ( D ) t . If the Pareto principle holds with respect to A , then f is weakly increasing,and if Pigou-Dalton transfers are weak improvements, then f is weakly con-cave. A few points in the theorem require further explanation. We will ex-plain the relevant notion of differentiability when it comes to the proof(see Remark 1 in the appendix); as usual, functions that are easy to writedown tend to be differentiable, but it isn’t automatic. The

Pareto princi-ple holds that increasing anyone’s welfare increases the value of the pop-ulation. This principle clearly holds for prioritarian views (because thepriority-weighting f is assumed to be increasing), but it need not in prin-ciple hold for egalitarian views: conceptually, increasing someone’s well-being might contribute so much to inequality as to be on net a bad thing.Still, the Pareto principle is generally held to be a desideratum for egalitar-ian views. Finally, a Pigou-Dalton transfer is a total-preserving transfer ofwelfare from a better-off person to a worse-off person that keeps the ﬁrstperson better-off than the second. The condition that Pigou-Dalton trans-fers are at least weak improvements (they do not make things worse) is of-ten understood as a minimal requirement for egalitarianism.To illustrate this result, let’s consider two more speciﬁc families of egali-tarian axiologies that instantiate the schemata of totalist and averagist two-factor egalitarianism respectively.For the ﬁrst, we’ll use a measure of inequality based on the mean abso-lute difference (MD) of welfare, deﬁned for any population X as follows:MD ( X ) : = X v , w ∈W X ( w ) X ( v ) | X | | w − v | .MD ( X ) represents the average welfare inequality between any two individ-uals in X . MD ( X ) | X | can therefore be understood as measuring total pair-wise inequality in X . Consider, then, the following totalist two-factor view: Here 1 w ∈ P is the population with a single welfare subject at level w , and we usethe fact that value functions of the assumed form can be evaluated directly on any ﬁnitelysupported, non-zero function W → R + , such as, in particular, D and D + t w . ean Absolute Difference Total Egalitarianism ( MDT ) V MDT ( X ) = Tot ( X ) − α MD ( X ) | X | where α ∈ (

0, 1 / ) is a constant that determines the relative importanceof inequality. Second, consider the following averagist two-factor view, which identi-ﬁes overall value with a quasi-arithmetic mean of welfare: Quasi-Arithmetic Average Egalitarianism ( QAA ) V QAA ( X ) = QAM ( X ) = g − (cid:129) X w ∈W X ( w ) | X | g ( w ) ‹ .for some strictly increasing, concave function g : W → R .Implicitly, the measure of inequality in QAA is I ( X ) = X − QAM ( X ) , whichone can show is a positive function, weakly decreasing under Pigou-Daltontransfers. In the limiting case where g is linear, QAM ( X ) = X . More gener-ally, QAA is ordinally equivalent to an averagist version of prioritarianism. Theorem 4.

QAA converges to PR , relative to background populations witha given distribution D . Speciﬁcally, QAA g converges to PR f , the prioritarianaxiology whose weighting function isf ( w ) = g ( w ) − g ( QAM ( D )) . For α ≥ /

2, equality would be so important that the Pareto principle would fail, i.e.,it would no longer be true in general that increasing someone’s welfare level increases thevalue of the population. See Fleurbaey (2010) and McCarthy (2015, Theorem 1) for axiomatizations of this typeof egalitarianism, at least in ﬁxed-population cases where the totalist / averagist distinctionis irrelevant. .2 Rank discounting Another family of population axiologies that is often taken to reﬂect egali-tarian motivations is rank-discounted utilitarianism (RDU). The essentialidea of rank-discounting is to give different weights to marginal changesin the welfare of different individuals, not based on their absolute welfarelevel (as prioritarianism does), but rather based on their welfare rank withinthe population.One potential motivation for RDU over two-factor views is that, becausewe are simply applying different positive weights to the marginal welfare ofeach individual, we clearly avoid any charge of ‘leveling down’: unlike ontwo-factor views, there is nothing even pro tanto good about reducing thewelfare of a better-off individual—it is simply less bad than reducing thewelfare of a worse-off individual. Versions of rank-discounted utilitarianism have been discussed and ad-vocated under various names in both philosophy and economics, e.g. byAsheim and Zuber (2014) and Buchak (2017). In these contexts, the RDUvalue function is generally taken to have the following form: V ( X ) = | X | X k = f ( k ) X k (2)where X k denotes the welfare of the k th worst off welfare subject in X , and f : N → R is a positive but decreasing function. However, these discussions often assume a context of ﬁxed populationsize, and there are different ways one might extend the formula when thesize is not ﬁxed. We will consider the most obvious approach, simply tak-ing equation (2) as a deﬁnition regardless of the size of X . A view of It is important to remember, however, that two-factor views with an appropriatelychosen I , like those we considered in the last section, can avoid all-things-considered lev-eling down: that is, while they may suggest that there is something good about making thebest off worse off, they never claim that it would be an all-things-considered improvement. Using the standard notation in this paper, one can alternatively write V ( X ) = X w ∈W (cid:129) g (cid:0)X v ≤ w X ( v )) − g ( X v < w X ( v ) (cid:1)‹ w for some increasing, concave function g : R → R with g ( ) =

0. The two presentations areequivalent if g ( k ) = P ki = f ( k ) or conversely f ( k ) = g ( k ) − g ( k − ) . An alternative approach would be to extend to variable-populations the ‘veil of igno-rance’ description of rank-discounting described by Buchak (see also McCarthy et al. (2020,Example 2.9)). However, on the most obvious way of doing this, the resulting view is coex-tensive with a two-factor egalitarian view and so falls under the purview of Theorem 3 (evenif it is conceptually different in important ways).

Geometric Rank-Discounted Utilitarianism ( GRD ) V GRD ( X ) = | X | X k = β k X k for some β ∈ (

0, 1 ) .Here, the rank-weighting function is f ( k ) = β k . In general, since f isassumed to be non-increasing and positive, f ( k ) must asymptotically ap-proach some limit L as k increases. For GRD, L =

0. But a simpler situationarises when L > f is bounded away from zero), and this is the casewe will consider ﬁrst, before returning to GRD. Bounded Rank-Discounted Utilitarianism ( BRD ) V BRD ( X ) = | X | X k = f ( k ) X k for some non-increasing, positive function f : R → R that is eventuallyconvex with asymptote L > Convergence on S Axiology A converges to A ′ , relative to background populations oftype T , on a set of populations S , if and only if, for any populations X and Y in S , if Z is a sufﬁciently large population of type T , then X + Z ≻ A ′ Y + Z = ⇒ X + Z ≻ A Y + Z .Having ﬁxed a background distribution D = Z / | Z | , say that a population X is moderate with respect to D if the the lowest welfare level in X is no lowerthan the the lowest welfare level in D . In other words, for any x ∈ W with X ( x ) =

0, there is some z ∈ W with z ≤ x and D ( z ) =

0. Then we can statethe following result: That is, there is some k such that f is convex on the interval ( k , ∞ ) . The assumptionof eventual convexity is simply a technical assumption to be used in Theorem 6 below. heorem 6. BRD converges to TU relative to background populations witha given distribution D , on the set of populations that are moderate with re-spect to D . When, as in GRD, the asymptote of the weighting function f is at L = f decays. We will consider only GRD, as it is the best-motivated example inthe literature.In fact, GRD does not converge to an additive, Paretian axiology on anyinteresting range of populations. Roughly speaking, this is because, as thebackground population gets larger, the weight given to the best-off indi-vidual in X becomes arbitrarily small relative to the weight given to theworst-off—smaller than the relative weight given to it by any particular ad-ditive, Paretian axiology. Nonetheless, it turns out that GRD does convergeto a separable , Paretian axiology. We’ll explain this carefully, but perhapsthe most important take-away of this discussion will be that, given a largebackground population, GRD leads to some very strange and counterintu-itive results. The limiting axiology will be critical level leximin , deﬁned bythe following conditions: Critical Level Leximin ( CLL c )

1. If X and Y have the same size, then X ≻ Y if and only if X = Y and the least k such that X k = Y k is such that X k ≻ Y k .2. If X and Y differ only in that Y has additional individuals at wel-fare level c , then X and Y are equally good. In a sense, CLL c is simply a limiting case of prioritarianism, where the pri-ority given to the less-well-off is inﬁnite. In particular, although it is notadditively separable in the narrow sense deﬁned in §2, which requires anassignment of real numbers to each individual, one can check that it is sep-arable, and indeed one can show that it is additively separable in a moregeneral sense, if we allow the contributory value of an individual’s welfareto be represented by a vector rather than a single real number. To state the theorem, ﬁx a set W ⊂ W of welfare levels. Say that a pop-ulation X is supported on W if X ( w ) = w / ∈ W . And say that W is covered by a distribution D = Z / | Z | if and only if there is a welfare level in Z To compare X and Y in general, use the second condition to ﬁnd populations X ′ and Y ′ that are equally as good as X and Y respectively, but such that | X ′ | = | Y ′ | , and thencompare them using the ﬁrst condition. See McCarthy et al. (2020, Example 2.7) for details in the constant-population-sizecase. W , a welfare level in Z below every elementof W , and welfare level in Z above every element of W . Theorem 7.

Let W ⊂ W be any set of welfare levels, and D a populationthat covers W .

GRD converges to

CLL c relative to background populationswith distribution D , on the set of populations that are supported on W ; thecritical level c is the highest welfare level occurring in D . Critical level leximin has a number of extreme and implausible features;as the theorem suggests, these will often be displayed by GRD when there isa large background population. For example, tiny beneﬁts to worse-off in-dividuals will often be preferred over astronomical beneﬁts to even slightlybetter-off individuals; moreover, adding an individual to the populationwith anything less than the maximum welfare level in the background pop-ulation will often make things worse overall. In fact, according to CLL c ,it makes things worse to add one person slightly below the critical levelalong with any number of people above the critical level; because of this,GRD implies what we might call the ‘Snobbish Conclusion’: Snobbish Conclusion

Suppose X consists of one person with an arbitrarily good life, at level w , and any number of people with even better lives. Then there is somepossible background population Z , in which the average welfare is farworse than w , and in which the very best lives are only slightly betterthan w , such that Z + X is worse than Z .This seems crazy to us. We could just about understand the Snobbish Con-clusion in the context of an anti-natalist view, according to which addinglives invariably has negative value; but, according to GRD, there are manypossible background populations Z such that Z + X would be better than Z . We could also understand the view that adding good lives can makethings worse if it lowers average welfare or increases inequality (e.g. as mea-sured by mean absolute difference or standard deviation). But, again, that’snot what’s going on here. Instead, GRD implies that adding excellent livesmakes things worse if the number of even slightly better lives already in ex-istence happens to be sufﬁciently great, regardless of the other facts about A toy example illustrates these phenomena, which are somewhat more general thanthe theorem entails. Suppose the background population consists of N people at level 100.Let X consist of two people at level 99; let Y consist of one person at level 98 and one atlevel 1000; and let Z consists of two people at level 99 and one at 99.9. We have V GRD ( X ) − V GRD ( Y ) = β − β − β N + , which is positive if N is large enough, in which case X ≻ GRD Y ,illustrating the ﬁrst claim. On the other hand, V GRD ( X ) − V GRD ( Z ) = β − β N + , againpositive for N large enough; then X ≻ GRD Z , illustrating the second claim. In the rest of the paper, we investigate the implications of the precedingresults, and especially their practical implications for morally signiﬁcantreal-world choices. As we have seen, how closely a given non-additive ax-iology agrees with its additive counterpart in some real-world choice situ-ation depends on the size of the population that can be treated as ‘back-ground’ in that choice situation. And what that additive counterpart willbe (i.e., which version of CL or PR) depends on the average welfare of thebackground population, and perhaps on its entire welfare distribution. Inthis section, therefore, we consider the size and (to a lesser extent) the wel-fare of real-world background populations.We note that nothing in this section (or the next two) shows conclu-sively that the background population is large enough for our limit resultsto be effective, but we do establish a prima facie case for their relevance. In§9, we will seek ﬁrmer conclusions in a stylized case.We have so far taken the separation between ‘background’ and ‘fore-ground’ populations as given, but it will now be helpful to make these no-tions more precise. Given a choice between populations X , X , ... X n , thepopulation Z that can be treated as background with respect to that choiceis deﬁned by Z ( w ) = min i X i ( w ) . That is, the background population con-sists of the minimum number of welfare subjects at each welfare level whoare guaranteed to exist regardless of the agent’s choice. For this Z and foreach X i , there is then a population X ∗ i such that X i = X ∗ i + Z . The choicebetween X , X , ... X n can therefore be understood as a choice between theforeground populations X ∗ , X ∗ , . . ., X ∗ n , in the presence of background pop-ulation Z .Clearly, this means that different real-world choices will involve differ-ent background populations. In particular, more consequential choices(that have far-reaching effects on the overall population) allow less of thepopulation to be treated as background, whereas choices whose effects aretightly localized (or otherwise limited) may allow nearly the entire popula-tion to be treated as background. But we can also deﬁne a ‘shared’ back-ground population for some set of choice situations, by considering all theoverall populations that might be brought about by any proﬁle of choicesin those situations. Thus we can speak, for instance, of the population thatis ‘background’ with respect to all the choices faced by present-day human20gents, consisting of the minimum number of individuals at each welfarelevel that the overall population will contain whatever we all collectively do(perhaps simply equal to the number of individuals at each welfare leveloutside Earth’s present future light cone). Past welfare subjects on Earth constitute the most obvious component ofreal-world background populations. Estimates of the number of humanbeings who have ever lived are on the order of 10 (Kaneda and Haub, 2018),of whom only ∼ × are alive today. But of course Homo sapiens are notthe only welfare subjects. At any given time in the recent past, for instance,there are also many billions of mammals, birds, and ﬁsh being raised by hu-mans for meat and other agricultural products. And given their very highbirth / death rates, past members of these populations greatly outnumberpresent members.But since human agriculture is a relatively recent phenomenon, farmedanimals make only a relatively small contribution to the total backgroundpopulation. Wild animals make a far greater contribution. There are to-day, conservatively, 10 mammals living in the wild, along with similaror greater numbers of birds, reptiles, and amphibians, and a signiﬁcantlylarger number of ﬁsh—conservatively 10 , and possibly far more. Thisis despite the signiﬁcant decline in wild animal populations in recent cen-turies and millennia as a result of human encroachment. Inferring thetotal number of past mammals, vertebrates, etc from the number alive at agiven time requires us to make assumptions about population birth / deathrates. Unfortunately, we have not been able to ﬁnd data that allow us toestimate overall birth / death rates for the wild mammal or wild vertebratepopulations as a whole with any conﬁdence. So we will simply adopt whatstrikes us as a very safely conservative assumption of 0.1 births / deaths perindividual per year in wild animal populations (roughly corresponding toan average individual lifespan of 10 years). The actual rates are almost cer-tainly much higher (especially for vertebrates), implying larger total pastpopulations. Here and below, we assume a causal decision theory, which guarantees that causallyinaccessible populations can be treated as ‘background’. How we can identify backgroundpopulations, and how their practical signiﬁcance changes, in the context of non-causaldecision theories are interesting questions for future research. For useful surveys of evidence on present animal population sizes, see Tomasik (2019)and Bar-On et al. (2018) (especially pp. 61-4 and Table S1 in the supplementary appendix). For instance, Smil (2013, p. 228) estimates that wild mammalian biomass has declinedby 50% in the period 1900–2000 alone. mammals have been alive onEarth at any given time since the K-Pg boundary event (the extinction eventthat killed the dinosaurs, ∼

66 million years ago), with a population birth / deathrate of 0.1 per individual per year. This gives us a background populationof ∼ × individuals. Being a bit less conservative (though perhapsstill objectionably conservative), we might suppose that all and only ver-tebrates are welfare subjects and that 10 vertebrates have been alive onEarth at any time in the last 500 million years (since shortly after the Cam-brian explosion), with the same population birth / death rate of 0.1 per in-dividual per year. This gives us a background population of ∼ × indi-viduals. Anything we say about the distribution of welfare levels in the backgroundpopulation will of course be enormously speculative. So although the ques-tion has important implications, we will limit ourselves to a few brief re-marks.With respect to average welfare in the background population, two hypo-theses seem particularly plausible.

Hypothesis 1

The background population consists mainly of small an-imals (whether terrestrial or extraterrestrial). Most of these animalshave short natural lifespans, so the average welfare level of the back-ground population is very close to zero. If the capacity for positive / negativewelfare scales with brain size (or related features like cortical neuroncount), this would reinforce the same conclusion. It seems likely thataverage welfare in these populations will be negative, at least on a he-donic view of welfare (Ng, 1995; Horta, 2010). These assumptions to-gether would imply, for instance, that AU, VV1 and VV2 converge to a In the name of conservatism, we are setting aside various hypotheses that might gen-erate much larger background populations. First, of course, even the restriction to verte-brates excludes potential welfare subjects like crustaceans and insects. Second, we Earth-lings may not be the only welfare subjects. The observable universe contains roughly 2trillion galaxies (Conselice et al., 2016), and the universe as a whole is likely to be manytimes larger (Vardanyan et al., 2011). The universe could therefore contain many other bio-spheres like Earth’s. It might also contain advanced, spacefaring civilizations, which couldsupport enormous populations on the order of 10 individuals or more (Bostrom, 2003,2011). So the extraterrestrial background population could be many—indeed, indeﬁnitelymany—orders of magnitude larger than the populations of past mammals or vertebrateson Earth. Hypothesis 2

The background population mainly consists of the mem-bers of advanced alien civilizations. If, for instance, the average bio-sphere produces 10 wild animals over its lifetime, but one in a millionbiospheres gives rise to an interstellar civilization that produces 10 in-dividuals on average over its lifetime, then the denizens of these inter-stellar civilizations would greatly outnumber wild animals in the uni-verse as a whole. Under this hypothesis, given the limits of our presentknowledge, all bets are off: average welfare of the background popula-tion could be very high (Ord, 2020, pp. 235–9), very low (Sotala and Gloor,2017), or anything in between.With respect to the distribution of welfare more generally, we have evenless to say. There is clearly a non-trivial degree of welfare inequality in thebackground population—compare, for instance, the lives of a well-cared-for pet dog and a factory-farmed layer hen. Self-reported welfare levels inthe contemporary human population indicate substantial inequality (seefor instance Helliwell et al. (2019), Ch. 2), and while contemporary humansneed not belong to the background population with respect to present-daychoice situations, it seems safe to infer that there has been substantial wel-fare inequality in human populations in at least the recent past. For non-human animals, of course, we do not even have self-reports to rely on, andso any claims about the distribution of welfare are still more tentative. Butthere is, for instance, some literature on farm animal welfare that suggestssigniﬁcant inter-species welfare inequalities (e.g. Norwood and Lusk (2011,pp. 224–9), Browning (2020)).That said, it could still turn out that the background population is dom-inated by welfare subjects who lead fairly uniform lives—e.g., by small ani-mals who almost always experience lifetime welfare slightly below 0, or bymembers of alien civilizations that converge reliably on some set of values,social organization, etc., that produce enormous numbers of individualswith near-equal welfare. We have shown that various non-additive axiologies converge to additiveaxiologies in the large-background-population limit. But proponents ofnon-additive views might wish to avoid drawing practical conclusions from23hese results. After all, much of the point of being, say, an average utilitar-ian rather than a critical-level utilitarian is to reach the right practical con-clusions in cases where AU seems more plausible than CL. That point isdefeated if, in practice, AU is nearly indistinguishable from CL.The simplest way to avoid the implications of our limit results is toclaim that, for decision-making purposes, agents should simply ignore mostor all of the background population. This idea can be spelled out in vari-ous ways, but it seems to us that the most principled and plausible precisi-ﬁcation is a causal domain restriction (Bostrom, 2011), according to whichan agent should evaluate the potential outcomes of her actions by apply-ing the correct axiology only to those populations that might exist in hercausal future (presumably, her future light cone). Since background pop-ulations of the sort described in the last section will mostly lie outside anagent’s future light cone, a causal domain restriction may drastically re-duce the size of the population that can be treated as background, andhence the practical signiﬁcance of our limit results.Here are three replies to this suggestion. First, to adopt a causal do-main restriction is to abandon a central and deeply appealing feature ofconsequentialism, namely, the idea that we have reason to make the worlda better place , from an impartial and universal point of view. That someact would make the world a better place, full stop , is a straightforward andcompelling reason to do it. It is much harder to explain why the fact that anact would make your future light cone a better place (e.g., by maximizingthe average welfare of its population), while making the world as a wholeworse, should count in its favor. Second, the combination of a causal domain restriction with a non-separable axiology can generate counterintuitive inconsistencies betweenagents (and agent-stages) located at different times and places, with result-ing inefﬁciencies. As a simple example, suppose that A and B are bothagents who evaluate their options using causal-domain-restricted average A causal domain restriction might be motivated by the temporal value asymmetry , ourtendency to attach greater affective and evaluative weight to future events than to other-wise equivalent past events (Prior, 1959; Parﬁt, 1984, Ch. 8). It is sometimes claimed thatthis asymmetry characterizes only our self-regarding (and not our other-regarding) pref-erences (see e.g. Parﬁt, 1984, p. 181; Brink, 2011, pp. 378–9; Greene and Sullivan, 2015, p.968; Dougherty, 2015, p. 3), but recent empirical studies appear to contradict this claim(Caruso et al., 2008; Greene et al., forthcoming). However, though the temporal valueasymmetry is a clear and robust psychological phenomenon, it has proven notoriously dif-ﬁcult to come up with any normative justiﬁcation for asymmetric evaluation of past andfuture events (see for instance Moller (2002), Hare (2013)). This point goes back to Broad (1914); see Carlson (1995) for a detailed discussion ofthis area. t , A must choose between a population of one individ-ual with welfare 0 who will live from t to t (population X ) or a populationof one individual with welfare − t to t (population Y ). At t , B must choose between a population of three individuals withwelfare 5 (population Z ) or a population of one individual with welfare 6(population W ), both of which will live from t to t . If A chooses X , then B will choose W (yielding an average welfare of 6 in B ’s future light cone), butif A chooses Y , then B will choose Z (since Y + Z yields average welfare 3.5in B ’s future light cone, while Y + W yields only 2.5). Since A prefers Y + Z to X + W (which yield averages of 3.5 and 3 respectively in A ’s future lightcone), A will choose Y . Thus we get Y + Z , even though X + Z would havebeen better from both A ’s and B ’s perspectives. That two agents who ac-cept exactly the same normative theory and have exactly the same, per-fect information can ﬁnd themselves in such pointless squabbles is surelyan unwelcome feature of that normative theory, though we leave it to thereader to decide just how unwelcome. Third, a causal domain restriction might not be enough to avoid thelimit behaviors described in §§4–5, if there are large populations inside ourfuture light cones that are background (at least, to a good approximation)with respect to most real-world choice situations. For instance, it seemslikely that most choices we face will have little effect on wild animal popula-tions over the next 100 years. More precisely, our choices might be identity-affecting with respect to many or most wild animals born in the next cen-tury (in the standard ways in which our choices are generally supposed tobe identity-affecting with respect to most of the future population—see,e.g., Parﬁt (1984, Ch. 16)), but will have little if any affect on the number ofindividuals at each welfare level in that population. And this alone suppliesquite a large background population—perhaps 10 mammals and 10 vertebrates. Indeed, it is plausible that with respect to most choices (evencomparatively major, impactful choices), the vast majority of the presentand near-future human population can be treated as background. For in-stance, if we are choosing between spending $1 million on anti-malarial One general lesson of this example is that, when a group of timelike-related agents oragent-stages accept the same causal-domain-restricted non-separable axiology, an earlieragent in the group will have an incentive (i.e., will pay some welfare cost) to push axiolog-ically signiﬁcant events forward in time, into the future light cones of later agents, so thattheir evaluations of their options will more closely agree with hers. The argument is essentially due to Rabinowicz (1989); see also the cases of intertem-poral conﬂict for future-biased average utilitarianism in Hurka (1982b, pp. 118–9).Of course, cases like these also create potential time-inconsistencies for individualagents, as well as conﬂict between multiple agents. But these inconsistencies might beavoidable by standard tools of diachronic rationality like ‘resolute choice’. Another way one might try to avoid the limit behaviors described in §§4–5is to claim that not all welfare subjects make the same contribution to the‘size’ of a population, as it should be measured for axiological purposes.Roughly speaking: although we should not deny tout court that ﬁsh arewelfare subjects, perhaps, when evaluating outcomes, a typical ﬁsh shouldeffectively count as only (say) one tenth of a welfare subject, given its cogni-tive and physiological simplicity. If, in a typical choice situation, the back-ground population is predominantly made up of such simple creatures,then it might be dramatically smaller (in the relevant sense) than it wouldﬁrst appear. A bit more formally, we can understand this strategy as assigning a real-valued axiological weight to each individual in a population, and turningpopulations from integer-valued to real-valued functions, where X ( w ) nowrepresents not the number of welfare subjects in X with welfare w , but the sum of the axiological weights of all the welfare subjects in X with welfare w . Axiological weights might be determined by factors like brain size, neu-ron count, lifespan, or by a combination of ‘spatial’ and ‘temporal’ factors(e.g., lifespan times neuron count). Weighting by lifespan seems partic-ularly natural if we think that our ultimate objects of moral concern are stages , rather than complete, temporally extended individuals. Weightingby brain size or neuron count may seem natural if we believe that, in somesense, morally signiﬁcant properties like sentience ‘scale with’ these mea-sures of size.Here are three replies to this suggestion: First, of course, one mightlodge straightforward ethical objections to axiological weights. They seemto contradict the ideals of impartiality and equal consideration that are of-ten seen as central to ethics in general and axiology in particular (and forthis reason, may be especially hard to reconcile with egalitarian views in ax-iology). It’s also hard to imagine a plausible principle that assigns reduced For further discussion of, and objections to, causal domain restrictions in the contextof inﬁnite ethics, see Bostrom (2011) and Arntzenius (2014). Thanks to Tomi Francis and Toby Ord, who each separately suggested this objection. × − , which would cut our estimate of the size of the mam-malian background population from ∼ × down to ∼ × . Ifwe also weight by lifespan, and generously assume that present-day hu-mans have an average lifespan of 100 years, then the effective mammalianbackground population is reduced to ∼ × . Thus, even after mak-ing a host of conservative assumptions (only counting mammals as wel-fare subjects, taking a conservative estimate of the number of mammalsalive at a time, ignoring times before the K-Pg boundary event, weightingby cortical neuron count and lifespan, and taking mice as a stand-in forall non-human mammals), we are still left with a background populationmore than three orders of magnitude larger than the present human pop-ulation.Third and ﬁnally, as we have already argued, even if we entirely ignorenon-humans we may still ﬁnd that background populations are large rel-ative to foreground population in most present-day choice situations. Tobegin with, past humans outnumber present humans by more than an or-der of magnitude (as we saw in §6). And it seems plausible that the largemajority even of the present and near-future human population is approx-imately background in most choice situations (as we argued at the end of§7). Thus, even if we both severely deprioritize or ignore non-humans and adopt a causal domain restriction, we might still ﬁnd that background pop-ulations are usually large relative to foreground populations. When we weight by lifespan, we can derive population size simply from the numberof individuals alive at a time multiplied by time, without needing to make any assumptionsabout birth or death rates. The value of avoiding existential catastrophe

Taking stock: in §§4–5, we showed that various non-additive axiologiesconverge to additive axiologies in the presence of large enough backgroundpopulations. In §6, we argued that the background populations in real-world choice situations are very large—at least, multiple orders of magni-tude larger than the affectable portion of the present and near-future pop-ulation. And in §§7–8, we resisted two strategies for deﬂating the size ofreal-world background populations.If we are right about the size of real-world background populations, thisprovides a weak prima facie reason to believe that our limit results are prac-tically signiﬁcant—i.e., that what is true in the limit will be true in prac-tice, for the most plausible versions of the various families of non-additiveaxiologies we have considered. That is, the absolute and relative size ofreal-world background populations weakly suggests that we should expectplausible non-additive axiologies to agree closely with their additive coun-terparts in real-world choice situations. More generally, it suggests thateven if we don’t accept (additive) separability as a fundamental axiologicalprinciple, it may nevertheless be a useful heuristic for real-world decision-making purposes—i.e., that arguments in practical ethics that rely on sep-arability assumptions are likely to be truth-preserving in practice.

But we will focus on a particular issue in practical ethics where we can saysomething a bit more concrete and deﬁnite. As we suggested in §1, perhapsthe most important practical implication of our results concerns the impor-tance of existential catastrophes—more speciﬁcally, the extent to whichthe potentially astronomical scale of the far future makes it astronomicallyimportant to avoid existential catastrophe. An ‘existential catastrophe’, forour purposes, is any near-future event that would drastically reduce thefuture population size of human-originating civilization (e.g., human ex-tinction). To keep the discussion manageable, we will focus on AU and,secondarily, VV1 / VV2. This lets us isolate the central relevant feature ofinsensitivity to scale (or asymptotic insensitivity to scale) in the absence This is a broader category of events than ‘premature human extinction’—for instance,an event that prevented humanity from ever settling the stars, while allowing us to survivefor a very long time on Earth, could be an existential catastrophe in our sense. It is also im-portantly distinct from the usual concept of ‘existential catastrophe’ in the philosophical lit-erature, which is roughly ‘any event that would permanently curtail humanity’s long-termpotential for value’ (see for instance Bostrom, 2013, p. 15; Ord, 2020, p. 37).

28f background populations, without the essentially orthogonal feature ofinequality aversion. We will also focus on the case where the future gen-erations that will exist if we avoid existential catastrophe have higher av-erage welfare than the background population, so that AU assigns positivevalue to avoiding existential catastrophe, at least in the large-background-population limit. (But much of what we say about the value of avoidingexistential catastrophe on this assumption also applies, mutatis mutandis ,to the dis value of avoiding existential catastrophe on the opposite assump-tion that the potential future population has lower average welfare than thebackground population.)The importance of avoiding existential catastrophe can be measured bycomparing the value of avoiding existential catastrophe with the value ofimproving the welfare of the affectable pre-catastrophe population (which,for simplicity, we will hereafter call ‘the current generation’). We wouldlike to know how the answer to this question depends on the welfare and(especially) the size of the background population.To formalize the question, let C represent the current generation asit will be if we prioritize its welfare at the expense of allowing an existen-tial catastrophe. Let C ′ denote the current generation as it will be if weinstead prioritize avoiding an existential catastrophe. Thus C > C ′ , but weassume that | C | = | C ′ | . (This is mostly harmless: it just means that we des-ignate as the members of C ′ the ﬁrst | C | individuals in the affectable pop-ulation in the world where we avoid existential catastrophe.) Let F denotethe future population that will exist only if we avoid existential catastrophe.And suppose there is a background population Z , which includes past ter-restrial welfare subjects, perhaps distant aliens, and perhaps unaffectablepresent / future welfare subjects like wild animals.In short, we consider a choice between Z + C and Z + C ′ + F . In termsof this choice, the importance of avoiding existential catastrophe can bemade precise in several different ways. We will consider the following three: Maximum incurred cost.

Holding ﬁxed the average welfare C of the cur-rent generation in the world where existential catastrophe occurs, whatis the greatest reduction in welfare for the current generation that isworth accepting to avoid existential catastrophe? Maximum opportunity cost.

Holding ﬁxed the average welfare C ′ of thecurrent generation in the world where existential catastrophe does not For example, while totalist two-factor egalitarianism in not additive, it is relativelyclear that it can give great value to avoiding existential catastrophe, since the value of apopulation scales with its size.

Value difference ratio.

Holding ﬁxed both C and C ′ , and thinking of Z + C ′ as the status quo, what is the ratio between the changes in valuethat would result from (i) avoiding existential catastrophe by adding F ,versus (ii) raising the welfare of the current generation from C ′ to C ?Broadly, we want to know how the presence of Z affects these measuresof importance. We know they depend, for one thing, on the size of F ; wewant particularly to know how this dependence is mediated by the size of Z . In the extreme case, as | Z | → ∞ , we know from our results in §4 thatAU, VV1, and VV2 all converge to CL Z . And according to CL Z , the value ofadding F to the population scales with | F | , so that when | F | is astronom-ically large, the importance of avoiding existential catastrophe, by any ofthese measures, will be astronomically great. We should therefore expect, abit roughly, that AU will give great importance to avoiding existential catas-trophe when both | Z | and | F | are large, and more precisely that its mea-sures of importance will agree with those of CL Z . The task is to say moreabout how this works at a qualitative level, and then (in §9.5) to give someindicative numerical results. First, we hold ﬁxed the welfare of the the current generation in the catastro-phe world (where F does not exist), and consider the greatest welfare costwe are willing to impose on the current generation to avoid catastropheand thereby add F to the population.According to the CL Z , the axiology to which AU, VV1, and VV2 convergein the limit, this is simply the critical-level sum of welfare in F , given by | F | ( F − Z ) . That is, when Tot ( C ) − Tot ( C ′ ) = | F | ( F − Z ) , CL is indifferent be-tween Z + C ′ + F and Z + C . According to AU, analogously, the maximumcost we are willing to impose on the current generation is the cost at which Z + C ′ + F = Z + C . We solve for it, therefore, by rearranging this equationinto an equation for Tot ( C ) − Tot ( C ′ ) in terms of Z , C , and F . This rear-ranged equation turns out to be:Tot ( C ) − Tot ( C ′ ) = | Z || F | ( F − Z ) + | C || F | ( F − C ) | Z + C | . (3) If we instead wanted to focus on the average (per capita) welfare cost imposed on thecurrent generation, we could just divide both sides of the following equation by | C | . | F | , with or without a backgroundpopulation. As we will see, this is not the case for the other two measures ofthe importance of avoiding existential catastrophe we consider. The rightway to interpret this fact is as follows: if F > C and | F | ≫ | C | , then AUis willing to impose enormous costs on the current generation to enablethe existence of F , since if F exists, C ′ will be only a very small part ofthe resulting population and must have extremely low average welfare toreduce C ′ + F below C . And on the other hand, if C > F and | F | ≫ | C | ,then AU will require an enormous increase in the welfare of the currentgeneration (i.e., that C ′ ≫ C ) to compensate for the reduction in averagewelfare created by F .Nevertheless, even by this measure, the size of the background popula-tion makes a difference because it determines the ‘effective critical level’ towhich F is compared—the average welfare level above which adding F tothe population has positive value, and below which it has negative value.When | C | ≫ | Z | , the right-hand side of (3) is approximately | F | ( F − C ) ; thus AU agrees closely with CL C rather than CL Z and is only willing to im-pose any positive cost at all on the current generation to avoid existentialcatastrophe when (with some approximation) F > C . But when | Z | ≫ | C | ,the right-hand side of (3) is approximately | F | ( F − Z ) —i.e., the value givenby CL Z . This shift could either increase or decrease the value of avoiding ex-istential catastrophe (depending on whether Z is greater than or less than C ), and could reverse the sign of the value of avoiding existential catastro-phe if F is between Z and C . Most notably for our purposes, the effectivecritical level will be closer to Z than to C if | Z | > | C | , and will be very closeto Z if | Z | ≫ | C | (since | Z || F | ( F − Z ) rather than | F || C | ( F − C ) will domi-nate the numerator in (3)). So by this measure, AU closely agrees with itscorresponding additive limit theory as long as the background populationis substantially larger than the current generation, i.e., | Z | ≫ | C | . Now let’s ask the converse question: holding ﬁxed the welfare of the cur-rent generation in the world without existential catastrophe (i.e. holdingﬁxed C ′ ), how large a welfare gain for the current generation should we bewilling to forgo to avoid existential catastrophe? Formally, ‘if a ≫ b then x is approximately y ’ means that lim a / b →∞ x / y =

1. In thiscase, the limit converges uniformly in | F | . | F | ( F − Z ) . To ﬁnd AU’s answer, werearrange Z + C ′ + F = Z + C into an equation for Tot ( C ) − Tot ( C ′ ) , this timein terms of Z , F , and C ′ . This gives us:Tot ( C ) − Tot ( C ′ ) = | Z || F | ( F − Z ) + | C ′ || F | ( F − C ′ ) | Z + C ′ + F | . (4)Now the size of the background population takes on greater signiﬁcance.Consider three cases: Case 1: | F | ≫ | C ′ | ≫ | Z | . In this case, the right-hand side of (4) is approx-imately | C ′ | ( F − C ′ ) , and the value of avoiding existential catastropheas measured by maximum opportunity cost is therefore approximatelyindependent of | F | . Case 2: | F | ≫ | Z | ≫ | C ′ | . In this case, the right-hand side of (4) is approx-imately | Z | ( F − Z ) . Thus the value of avoiding existential catastropheas measured by maximum opportunity cost is approximately propor-tional to | Z | , which may be astronomically large but is also (we are sup-posing) much less than | F | . Note also that the effective critical level isnow close to Z rather than C ′ as in Case 1. Case 3: | Z | ≫ | F | ≫ | C ′ | . In this case, the right-hand side of (4) is approx-imately | F | ( F − Z ) , in agreement with CL Z . Thus the value of avoidingexistential catastrophe as measured by maximum opportunity cost isapproximately proportional to | F | , and will be astronomically large if | F | is astronomically large and ( F − Z ) is non-trivial.While there are a number of points of interest in this analysis, the quicktakeaway is that the maximum opportunity cost increases without boundas we increase both | F | and | Z | (while holding all else equal)—a situationreﬂected in Cases 2 and 3 but not Case 1. So, qualitatively, arguments fromastronomical scale can go through if we attend to the potentially astronom-ical scale of both the future population and the background population. Finally, we treat Z + C ′ as a baseline, and ask whether it is better to avoidexistential catastrophe by adding F or to improve C ′ to C . More precisely, Formally, a claim to the effect of ‘if a ≫ b ≫ c then x is approximately y ’ means that x / y → a / b and b / c → ∞ ; more precisely, for any ε >

0, there exists n > a / b and b / c are bigger than n , then x / y ∈ ( − ε , 1 + ε ) .

32e consider the ratio of the value of these improvements: R = V ( Z + C ′ + F ) − V ( Z + C ′ ) V ( Z + C ) − V ( Z + C ′ ) .According to CL Z , R is equal to | F | ( F − Z ) | C | ( C − C ′ ) . According to AU, of course, R isequal to Z + C ′ + F − Z + C ′ Z + C − Z + C ′ . But again, we need to do some rearranging to makeclear how this ratio is affected by the sizes of Z , C , and F . Speciﬁcally, inthe case of AU, the formula for R rearranges to1 C − C ′ (cid:129) F | F || Z + C || C || Z + C + F | − C ′ | F || Z + C + F | − Z | Z || F || C || Z + C + F | ‹ . (5)This expression is unattractive, but informative. Again, let’s considerthree cases: Case 1: | F | ≫ | C | ≫ | Z | . In this case, (5) is approximately F − C ′ C − C ′ , and theimportance of avoiding existential catastrophe by the value differenceratio measure is therefore approximately independent of | F | . Case 2: | F | ≫ | Z | ≫ | C | . In this case, (5) is approximately | Z || C | × F − ZC − C ′ . Thusthe importance of avoiding existential catastrophe by the value differ-ence ratio measure is approximately proportional to | Z || C | . And again,note that when | Z | ≫ | C | , the effective critical level is close to ∼ Z ratherthan C ′ . Case 3: | Z | ≫ | F | ≫ | C | . In this case, (5) is approximately | F || C | × F − ZC − C ′ , inagreement with CL Z . Thus the importance of avoiding existential catas-trophe by the value difference ratio measure is now approximately pro-portional to | F || C | , and will be astronomically large if | F || C | is astronomicallylarge and F − ZC − C ′ is non-trivial.As with the maximum opportunity cost, the most basic qualitative pointis that the value difference ratio R increases without bound as we increase both | F | and | Z | . The fact that possible future and actual background pop-ulations are both likely to be extremely large suggests that the value differ-ence ratio will be greater than 1 (thus favouring extinction-avoidance) fora robust range of the other parameters. So far, our analysis has remained qualitative; we’ll now put in some num-bers, with the purpose of illustrating two things: ﬁrst, the practical point33 xiology | Z | MIC MOC VDR AU | Z | = × ∼ ∼ | Z | = ∼ × ∼ × ∼ × AU | Z | = ∼ × ∼ × ∼ × CL — 2 × × × T ABLE

1: The importance of avoiding existential catastrophe, as measured by maximumincurred cost (MIC), maximum opportunity cost (MOC), and value difference ratio (VDR),according to AU for different background population sizes and CL Z , with F = | F | = , C = C ′ = | C | = | C ′ | = , Z =

0, and | Z | as speciﬁed in each row. that even AU will give great weight to avoiding existential catastrophes, forsome reasonable and even conservative estimates of the background pop-ulation and other parameters; second, the more theoretical point that AUconverges to CL with high precision, given these same estimates.For the sizes of the foreground populations, let’s suppose that | C | = | C ′ | = (a realistic estimate of the size of the present and near-futurehuman population) and | F | = (a fairly conservative estimate of the po-tential size of the future human population, if we avoid existential catastro-phe, arrived at by assuming 10 individuals per century for the next billionyears). For | Z | , we will consider three values: | Z | = | Z | = (a rounding-down of our most con-servative estimate of the number of past mammals, weighted by lifespanand cortical neuron count, from §8), and | Z | = × | F | = (arrived atby assuming that the universe contains 1000 other advanced civilizations,of the same scale that our civilization will achieve if we avoid existentialcatastrophe).In terms of average welfare, we have much less to go on. For simplicitylet’s assume that F = Z = Z consists mainly ofwild animals, somewhat less plausible for the case where it consists mainlyof the member of other advanced civilizations). And let’s assume that C ′ = C ′ is a depen-dent variable) and C = C is a dependent variable).Table 1 gives the importance of avoiding existential catastrophe accord-ing to AU and CL Z , under these assumptions, for all three measures of im-portance and all three background population sizes. In general, we seethat with three- or four-order-of-magnitude differences in the populationsizes of C , F , and Z , the approximations arrived at above are accurate to34t least the third or fourth signiﬁcant ﬁgure. And more speciﬁcally, in thecase where | Z | ≫ | F | ≫ | C | , AU agrees with CL Z on all three measures to atleast the fourth signiﬁcant ﬁgure. In summary: when the background population is small or non-existent,the importance of avoiding existential catastrophe according to AU is approxi-mately proportional to F − C ′ or F − C (depending on which measure weconsider), and approximately independent of population size, and is there-fore unlikely to be astronomically large. When the background populationis much larger than the current generation, but still much smaller than thepotential future population, the importance of avoiding existential catas-trophe according to AU approximately scales with | Z | , and may thereforebe astronomically large, while still falling well short of its importance ac-cording to CL Z . Finally, if the background population is much larger eventhan the potential future population (as it would be, for instance, if it in-cludes many advanced civilizations elsewhere in the universe), AU agreesclosely with CL Z about the importance of avoiding existential catastrophe,treating it as approximately linear in | F | , by all three of the measures weconsidered. The exception to this pattern is the ‘maximum incurred cost’measure, by which the importance of avoiding existential catastrophe scaleswith | F | regardless of the size of the background population.In this very speciﬁc context, therefore, we can now say how large thebackground population needs to be for large-background-population lim-iting behavior to ‘kick in’: AU closely approximates CL Z in every respectwe have considered only when | Z | ≫ | F | (or at any rate, only when | Z | > | F | ). But it behaves in important ways like CL Z as long as | Z | ≫ | C | —bothin that it is disposed to assign astronomical importance to avoiding exis-tential catastrophes, and in that the effective critical level that determineswhether that importance is positive or negative is approximately Z . Thislends signiﬁcance to our conclusion in §6 that real-world background pop-ulations are much larger than the current generation (i.e., the affectablepresent and near-future population), whether or not they are large rela-tive to the potential future population as a whole. The former fact aloneis enough to have a signiﬁcant effect on how AU evaluates existential catas-trophes in practice.Our conclusions about AU also partially generalize to VV1 and VV2. Inthe case of VV1: for any two populations X and Y , if X > Y , | X | ≥ | Y | , and X ≥

0, then clearly any VV1 axiology will prefer X to Y . For our purposes,this means that any VV1 axiology, so long as it assigns non-negative value to35he non-catastrophe population Z + C ′ + F (i.e., so long as Z + C ′ + F ≥ C + Z wheneverAU does. Analogously, in the case of VV2 (which, recall, applies an increas-ing transformation f to the average welfare of a population): for any twopopulations X and Y , if X > Y , | X | ≥ | Y | , and f ( X ) ≥

0, then clearly anyVV2 axiology will prefer X to Y . For our purposes, this means that any VV2axiology, so long as it assigns non-negative value to the non-catastrophepopulation Z + C ′ + F (i.e., so long as f ( Z + C ′ + F ) ≥ Z + C whenever AU does.Putting these observations together, any VV axiology, as long as it as-signs positive value to the non-catastrophe population, will prefer it to thecatastrophe population whenever AU does. This means, among other things,that under this condition, the importance of avoiding existential catastro-phe as measured by maximum incurred cost or maximum opportunity cost,will be at least as great according to VV as according to AU . With respect tovalue difference ratio, things are a bit more complicated: when Z + C ′ + F ≥ and Z + C ′ + F ≥ Z + C ′ , VV1 is guaranteed to assign more importancethan AU to avoiding existential catastrophe by this measure. But we cannotsay anything analogous about VV2 in this case, since the transformation f it applies to average welfare can be arbitrarily concave or convex. A crucial limitation of our discussion, however, is that we have onlyconsidered the objective importance of existential catastrophes , and not theprospective or decision-theoretic signiﬁcance of existential risks (i.e., risksof existential catastrophe). If we assume a straightforward expectationaldecision theory according to which average utilitarians, for instance, shouldsimply maximize expected average welfare, then the astronomical decision- Consider VV2, of which VV1 is a special case (where f ( X ) = X ). If f ( Z + C ′ + F + C ′ ) = f ( Z + C ) , and is positive, then g ( | Z + C ′ + F | ) f ( Z + C ′ + F ) > g ( | Z + C | ) f ( Z + C ) , since g is increasing. Thus, all else being equal, VV2 axiologies will require either a larger valueof C or a smaller value of C ′ to equalize the value of the populations, meaning that themaximum incurred cost / maximum opportunity cost that it will accept to avoid existentialcatastrophe is greater.This does not necessarily mean that VV will converge with CL Z faster than AU, with re-spect to these measures, as the size of the background population increases. After all, g may be arbitrarily close to linear up to arbitrarily large population sizes, allowing VV to re-main in close agreement with TU rather than CL Z for arbitrarily large populations. But itdoes mean that VV will converge with CL Z faster than AU if it is converging from below. If Z + C ′ + F and Z + C ′ + F − Z + C ′ are both non-negative, then g ( | Z + C + F | ) Z + C ′ + F − g ( | Z + C | ) Z + C ′ g ( | Z + C | ) Z + C − g ( | Z + C | ) Z + C ′ ≥ Z + C ′ + F − Z + C ′ Z + C − Z + C ′ . (Again, this means that VV1 will convergewith CL Z faster than AU, with respect to the value difference ratio measure, if it is con-verging from below.) However, since VV2’s f need only be increasing, f ( Z + C ′ + F ) − f ( Z + C ′ ) f ( Z + C ) − f ( Z + C ′ ) candiffer to an arbitrarily extreme degree from Z + C ′ + F − Z + C ′ Z + C − Z + C ′ (except when Z + C ′ + F = Z + C ). , and there is not yet any unproblematic or widely accepted alter-native in the literature. We therefore leave the question of how AU, VV1,VV2, and other non-additive axiologies evaluate existential risk in the pres-ence of large background populations for future research.

10 Other implications

We conclude by brieﬂy surveying three other interesting implications ofour limit results and, more generally, of the inﬂuence of background pop-ulations on the preferences of non-separable axiologies.

The Repugnant Conclusion, recall, is the conclusion (implied by TU amongother axiologies) that for any positive welfare levels l < l and any number n , there is a population where everyone has welfare l that is better thana population of n individuals all with welfare l . One of the motivationsfor population axiologies with an ‘averagist’ ﬂavor (like AU, VV1, VV2, andQAA) is to avoid the Repugnant Conclusion. But the results in §§4–5 implythat, although they avoid the Repugnant Conclusion as stated above, theseviews cannot avoid the closely related phenomenon of ‘Repugnant Addi-tion’: for any positive welfare levels l < l and any number n , if Y consistsof n individuals all with welfare l , there is some population X in which ev-eryone has welfare l and some population Z such that X + Z is better than Y + Z . As per the results in §4, AU / VV1 / VV2 support Repugnant Additionwith respect to a large enough background population Z with Z ≤ Z <

0, they support the much more repugnant conclusionthat, for any population Y in which everyone has positive welfare, thereis a larger population X in which everyone has negative welfare such that X + Z is better than Y + Z ). See for instance Thomas (2016, ch. 3), McCarthy et al. (2020, Prop. 4.8), Nebel (forth-coming), Tarsney (unpublished).

A long-standing and unresolved challenge for axiology is how to extend ax-iologies from ﬁnite to inﬁnite contexts. Most of the extant proposals forranking inﬁnite worlds, in both philosophy and economics, aim to extendtotal utilitarianism. However, these proposals can easily be adapted to ex-tend other additive axiologies. For instance, a simple extension of total util-itarianism (suggested in Lauwers and Vallentyne (2004)) simply comparesany two populations by summing the differences in welfare between thetwo populations for each individual, treating an individual who doesn’t ex-ist in a population as having welfare 0. This axiological criterion can eas-ily be adapted to a critical-level or prioritarian theory by simply replacingwelfare with some increasing function of welfare.It is much less clear, however, how to extend non-additive theories tothe inﬁnite context, and there has so far been little if any discussion of thisquestion. Our limit results, however, suggest a partial answer: when com-paring two inﬁnite populations, at least when these populations differ onlyﬁnitely, we are quite literally in (and not merely approaching) the large-background-population limit. So it is natural to think that a non-additiveaxiology A that has an additive counterpart A ′ should agree exactly withthat additive counterpart in the inﬁnite context. For instance, if we are av-erage utilitarians and we live in an inﬁnite world, but we can only affect aﬁnite part of that inﬁnite world, then we should simply compare the pos-sible outcomes of our choices by the appropriate inﬁnite generalization ofcritical-level utilitarianism, where the critical level is the average welfarelevel in the background population.This suggestion is well-deﬁned only if we have a well-deﬁned notion of relative frequency for inﬁnite worlds—speciﬁcally, the relative frequency of For surveys of the difﬁculties of inﬁnite axiology, see for instance Asheim (2010),Bostrom (2011), and Ch. 1 of Askell (2018). See, for instance, Atsumi (1965), Diamond (1965), Von Weizsäcker (1965),Vallentyne and Kagan (1997), Lauwers and Vallentyne (2004), Bostrom (2011), Arntzenius(2014), Jonsson and Voorneveld (2018), Wilkinson (forthcoming), and Clark (ms), amongmany others. Formally, X ¼ Y if and only if P p i ∈ X ∪ Y w x ( p i ) − w y ( p i ) converges unconditionallyto a value ≥

0, where for any p i X , w x ( p i ) = Y ). welfare distribution and average welfare . A natu-ral suggestion here (advocated, for instance, by Knobe et al. (2006)) is touse the limiting relative frequency in uniformly expanding spatiotempo-ral regions, providing that this limit exists and is the same for all startinglocations. There is plenty of debate to be had about this proposal, but thisis not the place for that debate. At any rate, it seems plausible (though farfrom indisputable) that there should be some way of making sense of therelative frequencies of particular welfare levels in an inﬁnite population. The results in §§4–5 have one other interesting implication: they suggest away in which agents who accept non-separable axiologies can be manipu-lated . Suppose, for instance, that we in the Milky Way are all average utili-tarians, while the inhabitants of the Andromeda Galaxy are all total utilitar-ians. And suppose that, the distance between the galaxies being what it is,we can communicate with each other but cannot otherwise interact. Beingtotal utilitarians, the Andromedans would prefer that we act in ways thatmaximize total welfare in the Milky Way. To bring that about, they mightcreate an enormous number of welfare subjects with welfare very close tozero—for instance, breeding quintillions of very small, short-lived animalswith mostly bland experiences—and send us proof that they have done so.We in the Milky Way would then make all our choices under the awarenessof a large background population whose average welfare is close to zero.If they could create for us a large enough background population with av-erage welfare sufﬁciently close to zero, the Andromedans could move usarbitrarily close to de facto total utilitarianism.It’s not obvious whether such a strategy would be efﬁcient, but it mightbe, if creating small, short-lived welfare subjects with bland experiences(and transmitting the necessary proof of their existence) is sufﬁciently cheap.Since the cost of creating a welfare subject with welfare x presumably in-creases with | x | (and plausibly increases at a super-linear rate), it mightwell make sense for the Andromedans to devote some of their resources tothis manipulation strategy rather than spending all their resources directlyon creating welfare subjects with high welfare.As the preceding results demonstrate, this kind of manipulability is notunique to average utilitarians, but applies also to agents who accept variable-value or non-separable egalitarian views. Moreover, the potential for ma- But manipulating egalitarians may be more expensive, if it requires creating beings some convergence among axiologies on particular prac-tical conclusions, axiological disputes remain practically signiﬁcant.

11 Conclusion

We have shown that, in the presence of large enough background popula-tions, a range of non-additive axiologies asymptotically agree with somecounterpart additive axiology (either critical-level or, more broadly, prior-itarian). And we have argued that the real-world background populationis large enough to make these limit results practically relevant. The mostnotable implication of these arguments is that ‘arguments from astronom-ical scale’—in particular, for the overwhelming importance of existential with a wide distribution of welfare levels. Likewise, agents who accept a critical-level viewother than TU may ﬁnd it more expensive to manipulate in this way, since they may need tocreate welfare subjects at or near what they regard as the critical level—unless, for instance,creating welfare subjects with welfare close to zero can reduce the average welfare of a pre-existing background population toward that critical level. / uncertainty, particularly with respect to these character-istics of the background population; (3) the behavior of a wider range ofnon-additive axiologies (e.g. incomplete, intransitive, or person-affecting)in the large-background-population limit; and (4) exploring more gener-ally the question of how large the background population needs to be forthe limit results to ‘kick in’, for a wider range of axiologies and choice situ-ations than we considered in §9. A Results

Recall that W is the set of welfare levels, and P consists of all non-zero,ﬁnitely supported functions W → Z + . By a type of populations we mean aset T ⊂ W that contains populations of arbitrarily large size: for all n ∈ N there exists X ∈ T with | X | ≥ n .The following result, while elementary, indicates our general method. Lemma 1.

Suppose given V : P → R and a positive function s : N → R . De-ﬁne V s ( X ) : = lim | Z |→∞ (cid:0) V ( X + Z ) − V ( Z ) (cid:1) s ( | Z | ) as Z ranges over populations of some type T . If the axiology with value func-tion V s is separable, then the axiology with value function V converges toit, relative to background populations of type T .Proof. Let Z be a background population of type T . Suppose that V s ( X + Z ) > V s ( Y + Z ) . Given that the corresponding axiology is separable, wemust have V s ( X ) > V s ( Y ) . Then, if | Z | is large enough, (cid:0) V ( X + Z ) − V ( Z ) (cid:1) s ( | Z | ) > (cid:0) V ( Y + Z ) − V ( Z ) (cid:1) s ( | Z | ) ,whence, rearranging, V ( X + Z ) > V ( Y + Z ) . Theorem 1.

Average utilitarianism converges to CL c , relative to backgroundpopulations with average welfare c . In fact, for any populations X , Y , Z , ifZ = c and | Z | > | X | V CL c ( Y ) − | Y | V CL c ( X ) V CL c ( X ) − V CL c ( Y ) (1)41 hen V CL c ( X ) > V CL c ( Y ) = ⇒ V AU ( X + Z ) > V AU ( Y + Z ) .Proof. In this case, a brief calculation shows V AU ( X + Z ) − V AU ( Z ) = ( X − Z ) | X || X | + | Z | = V CL c ( X ) | X | + | Z | . (6)Setting s ( n ) = n we ﬁnd V s AU ( X ) = V CL c ( X ) , in the notation of Lemma 1. ThatLemma then yields the ﬁrst statement.We now verify the more precise second statement directly. Suppose Z = c , that (1) holds, and that V CL c ( X ) > V CL c ( Y ) . We have to show V AU ( X + Z ) > V AU ( Y + Z ) . Using (6), that desired conclusion is equivalent to V CL c ( X ) | X | + | Z | > V CL c ( Y ) | Y | + | Z | .Cross-multiplying, this is equivalent to V CL c ( X )( | Y | + | Z | ) > V CL c ( Y )( | X | + | Z | ) or, rearranging, | Z | ( V CL c ( X ) − V CL c ( Y )) > | X | V CL c ( Y ) − | Y | V CL c ( X ) . (7)Given that V CL c ( X ) − V CL c ( Y ) >

0, the desired conclusion (7) follows from(1).

Theorem 2.

Variable value views converge to CL c relative to backgroundpopulations with average welfare c .Proof. Suppose the variable value view has a value function of the form V ( X ) = f ( X ) g ( | X | ) . Then V ( X + Z ) − V ( Z ) = f ( X + Z ) g ( | X | + | Z | ) − f ( Z ) g ( | Z | )= f ( X + Z ) (cid:0) g ( | X | + | Z | ) − g ( | Z | ) (cid:1) + (cid:0) f ( X + Z ) − f ( Z ) (cid:1) g ( | Z | ) .We now apply two lemmas, proved below. Lemma 2.

We have (cid:0) g ( | X + Z | ) − g ( | Z | ) (cid:1) | Z | → as | Z | → ∞ . Lemma 3.

We have (cid:0) f ( X + Z ) − f ( Z ) (cid:1) | Z | → f ′ ( c ) V CL c ( X ) as | Z | → ∞ withZ = c . f ( X + Z ) → f ( c ) , and g ( | Z | ) approaches some upper bound L as | Z | → ∞ , we ﬁndlim | Z |→∞ (cid:0) V ( X + Z ) − V ( Z ) (cid:1) | Z | = f ′ ( c ) V CL c ( X ) L as Z ranges over populations with Z = c . Let s ( n ) = nf ′ ( c ) L . Then we havefound lim | Z |→∞ (cid:0) V ( X + Z ) − V ( Z ) (cid:1) s ( | Z | ) = V CL c ( X ) .The result now follows from Lemma 1. Proof of Lemma 2.

Let z be the result of rounding | Z | / g , we have ≤ g ( | X + Z | ) − g ( | Z | ) | X | ≤ g ( | Z | ) − g ( z ) | Z | − z ≤ g ( | Z | ) − g ( z ) | Z | / ≤ (cid:0) g ( | X + Z | ) − g ( | Z | ) (cid:1) | Z | ≤ (cid:0) g ( | Z | ) − g ( z ) (cid:1) | X | .Since g ( | Z | ) and g ( z ) both tend to a common limit L as | Z | → ∞ , we ﬁndthat the right-hand side tends to 0 in that limit. Therefore the expressionin the middle also tends to 0. Proof of Lemma 3.

First, if X = c then f ( X + Z ) − f ( Z ) = V CL c ( X ) = X + Z tends toward c as | Z | → ∞ , we have (by the deﬁnition of the derivative) f ( X + Z ) − f ( Z ) X + Z − Z → f ′ ( c ) .We have, from (6), X + Z − Z = V CL c ( X ) | X | + | Z | .Inserting this into the preceding formula, we ﬁnd ( f ( X + Z ) − f ( Z ))( | X | + | Z | ) → f ′ ( c ) V CL c ( X ) .Since ( f ( X + Z ) − f ( Z )) | X | →

0, we obtain the desired result. The general fact being used about concavity is that, if x > y > z , then g ( x ) − g ( y ) x − y ≤ g ( y ) − g ( z ) y − z . roposition 1. For any populations X and Y , if X ≻ TU Y and X ≻ AU Y ,then X ≻ VV1

Y .Proof.

First, note that V VV1 ( X ) has the same sign as X . So if X ≥ ≥ Y , thenit is automatic that V VV1 ( X ) > V VV1 ( Y ) . (The condition that X ≻ TU Y and X ≻ AU Y excludes the case where X = = Y .) Thus it remains to considerthe case when X and Y are both positive or both negative.First suppose they are positive. If | X | ≥ | Y | , then, since g is increasingand X > Y , V VV1 ( X ) = X g ( | X | ) > Y g ( | Y | ) = V VV1 ( Y ) , as required. If, instead, | Y | > | X | , then we have V VV1 ( X ) V VV1 ( Y ) = X g ( | X | ) Y g ( | Y | ) ≥ X | X | Y | Y | > V VV1 ( X ) > V VV1 ( Y ) . Here, the ﬁrst inequality uses the concav-ity of g , and the second the fact that Tot ( X ) > Tot ( Y ) > X and Y are negative is similar, with careful attentionto signs. Theorem 3.

Suppose V is a value function of the form V ( X ) = Tot ( X ) − I ( X ) | X | , or else V ( X ) = X − I ( X ) , where I is a differentiable function of thedistribution of X . Then the axiology A represented by V converges to anadditive axiology relative to background populations with any given distri-bution D , with weighting function f ( w ) = lim t → + V ( D + t w ) − V ( D ) t . If the Pareto principle holds with respect to A , then f is weakly increasing,and if Pigou-Dalton transfers are weak improvements, then f is weakly con-cave.Remark . Before proving Theorem 3, we should explain the requirementthat ‘ I is a differentiable function of the distribution of X ’. It has two parts.First, let P R be the set of ﬁnitely-supported, non-zero functions W → R + .Let D ⊂ P R be the subset of distributions, i.e. those functions that sum to1. The ﬁrst part of the requirement is that there is a function ˜ I : D → R suchthat I ( X ) = ˜ I ( X / | X | ) . In that sense, I ( X ) is just a function of the distributionof X . Another way to put this is that I can be extended to a function on all of P R that is scale-invariant, i.e. I ( n X ) = I ( X ) for all reals n > X ∈ P R . Here 1 w ∈ P is the population with a single welfare subject at level w , and we usethe fact that value functions of the assumed form can be evaluated directly on any ﬁnitelysupported, non-zero function W → R + , such as, in particular, D and D + t w . I , so extended, is differentiable,in the following sense: for all P , Q ∈ P R , the limit ∂ Q I ( P ) : = lim t → + I ( P + t Q ) − I ( P ) t exists and is linear as a function of Q . In effect, Q ∂ Q I ( P ) is the best linearapproximation of I − I ( P ) . In practice we only need I to be differentiableat the background distribution D . Proof.

Let Z range over background populations with the given distribu-tion D = Z / | Z | . Thus Z is of the form nD for some n > ∈ R .Deﬁne s ( n ) =

1, in the case of TU-based egalitarianism, and s ( n ) = n in the case of AU-based egalitarianism. Noting that value functions of theassumed form can be evaluated not only on P but on the larger set P R (seeRemark 1), we have V ( n X ) = ( n / s ( n )) V ( X ) .We can then see that V s (as deﬁned in Lemma 1) is the directionalderivative of V at D : V s ( X ) = lim | Z |→∞ (cid:0) V ( Z + X ) − V ( Z ) (cid:1) s ( | Z | )= lim n →∞ (cid:0) V ( nD + X ) − V ( nD ) (cid:1) s ( n )= lim n →∞ V ( D + n X ) − V ( D ) / n = : ∂ X V ( D ) .For totalist egalitarianism, we ﬁnd that V s ( X ) = Tot ( X ) − ∂ X I ( D ) − I ( D ) | X | .Given that I is differentiable as in Remark 1, this function is additive in X and therefore represents an additive axiology A ′ . More speciﬁcally, foreach welfare level w let 1 w be a population with one person at level w . Wethen have V s ( X ) = X w ∈W X ( w ) f ( w ) with f ( w ) = w − ∂ w I ( D ) − I ( D ) .Similarly, for averagist egalitarianism, V s ( X ) = ( X − D ) | X | − ∂ X I ( D )= X w ∈W X ( w ) f ( w ) with f ( w ) = w − ∂ w I ( D ) − D . This can also be interpreted as a differentiability requirement directly on ˜ I : it shouldhave a linear Gâteaux derivative. X + differs from X in that one person is better off, say withwelfare v instead of w . If the Pareto principle holds with respect to A , then V ( X + + Z ) ≥ V ( X + Z ) for all Z ; by convergence, we cannot have V s ( X + ) < V s ( X ) . It follows that f ( v ) ≥ f ( w ) ; thus f is weakly increasing. By the samelogic, Pigou-Dalton transfers do not make things worse with respect to A ′ ,and it follows that f is weakly concave. Theorem 4.

MDT converges to PR , relative to background populations witha given distribution D . Speciﬁcally, MDT α converges to PR f , the prioritar-ian axiology whose weighting function isf ( w ) = w − α MD ( w , D ) + α MD ( D ) . Here MD ( w , D ) : = P x ∈W D ( x ) | x − w | is the average distance between w andthe welfare levels occurring in D .Proof. Deﬁne 〈 X , Y 〉 = P x , y ∈W X ( x ) Y ( y ) | x − y | . Then MD ( Z ) = 〈 Z , Z 〉 / | Z | .It is easy to check that ∂ X 〈 Z , Z 〉 = 〈 X , Z 〉 and therefore ∂ X MD ( Z ) = 〈 X , Z 〉| Z | − 〈 Z , Z 〉| Z | | X | .In particular, MD is differentiable and Theorem 3 applies. Following theproof of Theorem 3, we know that MDT converges to the additive axiology A ′ with weighting function f ( w ) = w − α∂ w MD ( D ) − α MD ( D )= w − α 〈 w , D 〉 − α MD ( D )= w − α MD ( w , D ) + α MD ( D ) . Theorem 5.

Theorem 3 applies, with I ( X ) = X − QAM ( X ) . (We omit the proofthat this I is differentiable.) We have, then, convergence to prioritarianismwith a priority weighting function f ( w ) = ∂ w QAM ( D ) = g ( w ) − P x ∈W D ( x ) g ( x ) g ′ ( QAM ( D )) .Since the background distribution D is ﬁxed, this differs from the statedpriority weighting function only by a positive scalar (i.e. the denominator).46 heorem 6. BRD converges to TU relative to background populations witha given distribution D , on the set of populations that are moderate with re-spect to D .Proof. Suppose that the weighting function f has a horizontal asymptoteat L >

0. As in Lemma 1 it sufﬁces to show that lim | Z |→∞ V ( X + Z ) − V ( Z ) = L Tot ( X ) , as Z ranges over populations with distribution D , and on the as-sumption that X is moderate with respect to D .Write X ≤ w = P x ≤ w X ( w ) for the number of people in X with welfareat most w , and similarly X < w = P x < w X ( w ) . Separating out contributionsfrom X and contributions from Z , we have V ( X + Z ) − V ( Z ) = X w ∈W X ( w ) X i = f ( Z ≤ w + X < w + i ) w + X w ∈W Z ( w ) X i = (cid:0) f ( Z < w + X < w + i ) − f ( Z < w + i ) (cid:1) w .The assumption that X is moderate means that, in those cases where X ( w ) ≥

1, so that the ﬁrst inner sum is non-trivial, we also have Z ≤ w → ∞ . We seetherefore that each summand in the ﬁrst double-sum tends to Lw . Theﬁrst double sum then converges to P w ∈W X ( w ) Lw = L Tot ( X ) . It remainsto show that the second double sum converges to 0. Call the summand inthat double sum S ( w , i ) .Since there are ﬁnitely many w for which Z ( w ) ≥ w , the inner sumconverges to 0. If X < w =

0, then the inner sum is identically zero, so we canassume X < w ≥

1. We can also assume that Z < w is large enough that f isconvex in the relevant range; then0 ≤ S ( w , i ) ≤ (cid:0) f ( Z < w + X < w ) − f ( Z < w ) (cid:1) w .Moreover, the number of terms, Z ( w ) , is proportional to Z < w . It remainsto apply the following elementary lemma with n = Z < w and m = X < w . Lemma 4.

If f is an eventually convex function decreasing to a ﬁnite limit,then n ( f ( n + m ) − f ( n )) → as n → ∞ . This is just a small variation on Lemma 2, and we omit the proof.

Theorem 7.

Let W ⊂ W be any set of welfare levels, and D a populationthat covers W .

GRD converges to

CLL c relative to background populationswith distribution D , on the set of populations that are supported on W ; thecritical level c is the highest welfare level occurring in D . roof. Suppose X and Y are supported on W , and X ≻ CLL Y . Let Z be apopulation with distribution D , so Z = nD for some n >

0. We have toshow that X + Z ≻ GRD Y + Z for all n large enough.Let ˜ X and ˜ Y be populations of equal size, obtained from X and Y byadding people at the critical level c . The assumption that X ≻ CLL Y meansthat, for the ﬁrst m such that ˜ X m = ˜ Y m , we have ˜ X m > ˜ Y m . This shows that˜ Y m < c , so that in fact ˜ Y m = Y m . For brevity deﬁne w : = Y m .Let v be the next welfare level occurring in X + Y above w . If there isno such welfare level, then deﬁne v = c +

1. We can decompose Z (andsimilarly for other populations) as Z = Z − + Z w + Z + Z + , where Z − onlyinvolves welfare levels in the interval ( −∞ , w ) , Z w involves only w , Z onlyinvolves welfare levels in in ( w , v ) , and Z + only involves those in [ v , ∞ ) .Note that X − = Y − and X = Y = D covers W and is onlysupported up to c ) Z =

0. Now we use the stan-dard fact that P mi = β i = β − β m − β . It follows that V ( X w ) − V ( Y w ) = β β | Yw | − β | Xw | − β w .Therefore V ( X + Z ) − V ( Y + Z ) β | X − + Z − + Z w | = ( β | X w | − β | Y w | )( V ( Z ) − β w − β ) + R .Note that β | X w | − β | Y w | >

0. To conclude that V ( X + Z ) > V ( Y + Z ) for all n large enough, it sufﬁces to show thatlim n →∞ V ( Z ) > β w − β .In fact, if v ′ is the lowest welfare level greater than w occurring in D , then v ′ ∈ ( w , v ) and lim n →∞ V ( Z ) = β v ′ − β . References

Adler, M. (2009). Future generations: A prioritarian view.

George Washing-ton Law Review 77 , 1478–1520. 48dler, M. (2011).

Well-Being and Fair Distribution: Beyond Cost-BeneﬁtAnalysis . Oxford: Oxford University Press.Arneson, R. J. (2000). Luck egalitarianism and prioritarianism.

Ethics 110 (2), 339–349.Arntzenius, F. (2014). Utilitarianism, decision theory and eternity.

Philo-sophical Perspectives 28 (1), 31–58.Arrhenius, G. (2000). An impossibility theorem for welfarist axiologies.

Eco-nomics and Philosophy 16 (2), 247–266.Asheim, G. B. (2010). Intergenerational equity.

Annual Review of Eco-nomics 2 (1), 197–222.Asheim, G. B. and S. Zuber (2014). Escaping the repugnant conclusion:Rank-discounted utilitarianism with variable population.

TheoreticalEconomics 9 (3), 629–650.Askell, A. (2018).

Pareto Principles in Inﬁnite Ethics . Ph. D. thesis, New YorkUniversity.Atsumi, H. (1965). Neoclassical growth and the efﬁcient program of capitalaccumulation.

The Review of Economic Studies 32 (2), 127–136.Bar-On, Y. M., R. Phillips, and R. Milo (2018). The biomass distribution onEarth.

Proceedings of the National Academy of Sciences 115 (25), 6506–6511.Beckstead, N. (2013).

On the Overwhelming Importance of Shaping theFar Future . Ph. D. thesis, Rutgers University Graduate School - NewBrunswick.Beckstead, N. (2019). A brief argument for the overwhelming importanceof shaping the far future. In H. Greaves and T. Pummer (Eds.),

Effective Al-truism: Philosophical Issues , pp. 80–98. Oxford: Oxford University Press.Bentham, J. (1789).

An Introduction to the Principles of Morals and Legisla-tion . London: T. Payne and Son.Blackorby, C., W. Bossert, and D. Donaldson (1997). Critical-level utili-tarianism and the population-ethics dilemma.

Economics and Philos-ophy 13 (2), 197–230. 49lackorby, C., W. Bossert, and D. J. Donaldson (2005).

Population Issues inSocial Choice Theory, Welfare Economics, and Ethics . Cambridge: Cam-bridge University Press.Bostrom, N. (2003). Astronomical waste: The opportunity cost of delayedtechnological development.

Utilitas 15 (3), 308–314.Bostrom, N. (2011). Inﬁnite ethics.

Analysis and Metaphysics 10 , 9–59.Bostrom, N. (2013). Existential risk prevention as global priority.

GlobalPolicy 4 (1), 15–31.Brink, D. O. (2011). Prospects for temporal neutrality. In C. Callender (Ed.),

The Oxford Handbook of Philosophy of Time . Oxford: Oxford UniversityPress.Broad, C. D. (1914). The doctrine of consequences in ethics.

InternationalJournal of Ethics 24 (3), 293–320.Browning, H. (2020).

If I Could Talk to the Animals: Measuring Animal Wel-fare . Ph. D. thesis, Australian National University.Buchak, L. (2017). Taking risks behind the veil of ignorance.

Ethics 127 (3),610–644.Budolfson, M. and D. Spears (2018). Why the Repugnant Conclusion isinescapable. Unpublished manuscript, December 2018.Carlson, E. (1995).

Consequentialism Reconsidered . Kluwer.Caruso, E., D. Gilbert, and T. Wilson (2008). A wrinkle in time: Asymmetricvaluation of past and future events.

Psychological Science 19 (8), 796–801.Clark, M. (2019). Inﬁnite ethics, intrinsic value, and the Pareto principle.Unpublished manuscript.Conselice, C. J., A. Wilkinson, K. Duncan, and A. Mortlock (2016). The evo-lution of galaxy number density at z < The Astro-physical Journal 830 (2), 1–17.Cowen, T. (2018).

Stubborn Attachments: A Vision for a Society of Free, Pros-perous, and Responsible Individuals . San Francisco: Stripe Press.Crisp, R. (2003). Equality, priority, and compassion.

Ethics 113 (4), 745–763.50e Lazari-Radek, K. and P. Singer (2014).

The Point of View of the Universe:Sidgwick and Contemporary Ethics . Oxford: Oxford University Press.Diamond, P. A. (1965). The evaluation of inﬁnite utility streams.

Economet-rica: Journal of the Econometric Society 33 (1), 170–177.Dougherty, T. (2015). Future-bias and practical reason.

Philosophers’ Im-print 15 (30), 1–16.Fleurbaey, M. (2010). Assessing risky social situations.

Journal of PoliticalEconomy 118 , 649–80.Frankfurt, H. (1987). Equality as a moral ideal.

Ethics 98 (1), 21–43.Greaves, H. and W. MacAskill (2019). The case for strong longtermism.Global Priorities Institute Working Paper No. 7-2019.Greene, P., A. J. Latham, K. Miller, and J. Norton (forthcoming). Hedonicand non-hedonic bias towards the future.

Australasian Journal of Phi-losophy .Greene, P. and M. Sullivan (2015). Against time bias.

Ethics 125 (4), 947–970.Gustafsson, J. E. (forthcominga). Our intuitive grasp of the repugnant con-clusion. In G. Arrhenius, K. Bykvist, and T. Campbell (Eds.),

The OxfordHandbook of Population Ethics . Oxford: Oxford University Press.Gustafsson, J. E. (forthcomingb). Population axiology and the possibilityof a fourth category of absolute value.

Economics and Philosophy .Hardin, G. (1968). The tragedy of the commons.

Science 162 (3859), 1243–1248.Hare, C. (2013). Time – The Emotional Asymmetry. In A. Bardon andH. Dyke (Eds.),

A Companion to the Philosophy of Time , pp. 507–520.Wiley-Blackwell.Harsanyi, J. C. (1977). Morality and the theory of rational behavior.

SocialResearch 44 (4), 623–656.Helliwell, J. F., R. Layard, and J. D. Sachs (2019).

World Happiness Report2019 . New York: Sustainable Development Solutions Network.Horta, O. (2010). Debunking the idyllic view of natural processes: Popula-tion dynamics and suffering in the wild.

Telos: Revista Iberoamericanade Estudios Utilitaristas 17 (1), 73–90.51udson, J. L. (1987). The diminishing marginal value of happy people.

Philosophical Studies 51 (1), 123–137.Hurka, T. (1982a). Average utilitarianisms.

Analysis 42 (2), 65–69.Hurka, T. (1982b). More average utilitarianisms.

Analysis 42 (3), 115–119.Hurka, T. (1983). Value and population size.

Ethics 93 (3), 496–507.Hutcheson, F. (1725 (1738)).

An Inquiry into the Original of our Ideas ofBeauty and Virtue, In Two Treatises (4th ed.). London: D. Midwinter, A.Bettesworth, and C. Hitch, J. and J. Pemberton, R. Ware, C. Rivington, F.Clay, A. Ward, J. and P. Knap.Jonsson, A. and M. Voorneveld (2018). The limit of discounted utilitarian-ism.

Theoretical Economics 13 (1), 19–37.Kaneda, T. and C. Haub (2018). How many people have ever livedon earth? Population Reference Bureau. First published in 1997,updated in 2002, 2011, and 2018. Accessed 22 November 2019. URL:https: // / howmanypeoplehaveeverlivedonearth / .Knobe, J., K. D. Olum, and A. Vilenkin (2006). Philosophical implicationsof inﬂationary cosmology. The British Journal for the Philosophy of Sci-ence 57 (1), 47–67.Kowalczyk, K. Equality and population size. Unpublished manuscript,November 2019.Lauwers, L. and P. Vallentyne (2004). Inﬁnite utilitarianism: More is alwaysbetter.

Economics and Philosophy 20 (2), 307–330.McCarthy, D. (2015). Distributive equality.

Mind 124 (496), 1045–1109.McCarthy, D., K. Mikkola, and T. Thomas (2020). Utilitarianism with andwithout expected utility.

Journal of Mathematical Economics 87 , 77–113.Mill, J. S. (1863).

Utilitarianism . London: Parker, Son, and Bourne.Moller, D. (2002). Parﬁt on pains, pleasures, and the time of their occur-rence.

Canadian Journal of Philosophy 32 (1), 67–82.Nebel, J. M. (forthcoming). Rank-weighted utilitarianism and the veil ofignorance.

Ethics . 52g, Y. (1989). What should we do about future generations? Impossibilityof Parﬁt’s Theory X.

Economics and Philosophy 5 (2), 235–253.Ng, Y. (1995). Towards welfare biology: Evolutionary economics of animalconsciousness and suffering.

Biology and Philosophy 10 (3), 255–285.Norwood, F. B. and J. L. Lusk (2011).

Compassion, by the Pound: The Eco-nomics of Farm Animal Welfare . Oxford: Oxford University Press.Ord, T. (2020).

The Precipice: Existential Risk and the Future of Humanity .London: Bloomsbury Publishing.Parﬁt, D. (1984).

Reasons and Persons . Oxford: Oxford University Press.Parﬁt, D. (1997). Equality and priority.

Ratio 10 (3), 202–221.Pressman, M. (2015). A defence of average utilitarianism.

Utilitas 27 (4),389–424.Prior, A. N. (1959). Thank goodness that’s over.

Philosophy 34 (128), 12–17.Rabinowicz, W. (1989). Act-utilitarian prisoner’s dilemmas.

Theoria 55 (1),1–44.Roth, G. and U. Dicke (2005). Evolution of the brain and intelligence.

Trends in Cognitive Sciences 9 (5), 250–257.Shulman, C. (2014). Population ethics and inaccessible popula-tions.

Reﬂective Disequilibrium . Accessed 25 August 2020. URL: https://reflectivedisequilibrium.blogspot.com/2014/08/population-ethics-and-inaccessible.html .Sidgwick, H. (1907 (1874)).

The Methods of Ethics (7th ed.). London:Macmillan and Company.Smil, V. (2013).

Harvesting the Biosphere: What We Have Taken from Nature .Cambridge, MA: The MIT Press.Sotala, K. and L. Gloor (2017). Superintelligence as a cause or cure for risksof astronomical suffering.

Informatica 41 (4), 389–400.Tarsney, C. J. Average utilitarianism implies solipsistic egoism. Unpub-lished manuscript, March 2020.Thomas, T. (2016).

Topics in Population Ethics . Ph. D. thesis, Oxford Uni-versity. 53homas, T. (forthcoming). Separability. In G. Arrhenius, K. Bykvist, andT. Campbell (Eds.),

The Oxford Handbook of Population Ethics . Oxford:Oxford University Press.Tomasik, B. (2019). How many wild animals are there? First pub-lished 2009, updated 7 August 2019. Accessed 15 November 2019. URL:https: // reducing-suffering.org / how-many-wild-animals-are-there / .Vallentyne, P. and S. Kagan (1997). Inﬁnite value and ﬁnitely additive valuetheory. The Journal of Philosophy 94 (1), 5–26.Vardanyan, M., R. Trotta, and J. Silk (2011). Applications of Bayesian modelaveraging to the curvature and size of the Universe.

Monthly Notices ofthe Royal Astronomical Society: Letters 413 (1), L91–L95.Von Weizsäcker, C. C. (1965). Existence of optimal programs of accumula-tion for an inﬁnite time horizon.

The Review of Economic Studies 32 (2),85–104.Weirich, P. (1983). Utility tempered with equality.

Noûs 17 (3), 423–439.Wilkinson, H. (forthcoming). Inﬁnite aggregation: Expanded addition.